Sophy SEM
Sophy SEM

Reputation: 223

UTF-8 problems in PostgreSQL

My database is UTF-8 (PostgreSQL). I saved 'TESTµTEST' into the database and it's OK. But when I selected this value from the database I saw 'TESTµTEST'.

Moreover, when I made a request select * from tbl where f='TESTµTEST', I got this error:

ERROR: invalid byte sequence for encoding "UTF8": 0xb5.

Would you please give me any solutions?

Upvotes: 0

Views: 6966

Answers (2)

metasequoia
metasequoia

Reputation: 7274

I ran into the same error when copying tables into PostgreSQL 9.1 which contained the same symbol (tables from the standard nutrient database v26). I recreated the database with the new encoding, but I also had to specify the appropriate locale and template.

CREATE DATABASE testdb
  WITH OWNER = postgres
  ENCODING = 'LATIN1'
  LC_COLLATE = 'eng_canada.28591'
  LC_CTYPE = 'eng_canada.28591'
  TEMPLATE = template0;

Upvotes: 0

Craig Ringer
Craig Ringer

Reputation: 324375

That error shows that you are trying to decode latin-1 text as if it were utf-8. Most likely your client_encoding setting in PHP doesn't match the encoding of the data you're actually sending.

The string "TESTµTEST" is produced by encoding data from Unicode to a utf-8 byte sequence, then decoding it as latin-1. You can see this in psql:

regress=# select convert_from(convert_to('TESTµTEST','utf-8'),'latin-1');
 convert_from 
--------------
 TESTµTEST

If the PostgreSQL database were utf-8 it would convert latin-1 input to utf-8 if client_encoding was correctly set to latin-1. If client_encoding is incorrectly set to utf-8 and you send latin-1 encoded data, PostgreSQL will refuse to accept it with the message:

invalid byte sequence for encoding "UTF8": 0xb5

... which is what happens when you run that SELECT you showed. So - I'd say your client is set to client_encoding = 'utf-8' but your PHP scripts are actually sending latin-1 data. I expect that's because, as @dezso says, you're editing your PHP scripts with a text editor that's using the latin-1 encoding.

To find out which encoding PHP is using, use a PHP database connection to run SHOW client_encoding;.

To show the database encoding, run:

SELECT d.datname, pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding" 
FROM pg_database d WHERE datname = 'my_db_name_here';

Oh, another possibility is that Apache (or whatever) expects your PHP scripts to be utf-8 encoded, but they're actually latin-1 encoded files.

Upvotes: 3

Related Questions