Reputation: 8773
I'm trying to import some data from a bunch of tables in an PostgreSQL database in another bunch of tables in another PostgreSQL database. For importing I use PHP because I need to make some data manipulation before inserting it in the new database.
All went ok until some point where there is some text which contains single quotes. I tried to escape those quotes before inserting the data and i get this error:
ERROR: invalid byte sequence for encoding "UTF8": 0x96 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
The encoding on the old database is SQL_ASCII, while on the new database I have UTF8.
How can I solve this issue?
Upvotes: 1
Views: 644
Reputation: 43380
It's likely that you have some Windows-1252-encoded text in your old database (0x96 encodes EN DASH in that encoding).
When the server encoding is SQL_ASCII
, PostgreSQL does not perform encoding conversion or validation; it just stores and retrieves text as you gave it. If your old database had used UTF8
instead, you wouldn't be having this problem. Woulda coulda shoulda.
There's a decent chance that all of your text is in WIN1252
, but it's possible that there's a mixture of encodings. The easiest thing to do is assume the former and set the client encoding to WIN1252
in your migration script:
pg_set_client_encoding($new_database, 'WIN1252');
It's important that you set this on your connection to the new database. It's telling Postgres that Windows 1252-encoded text is coming in, and Postgres needs to convert it to UTF-8.
Note that if there's any UTF-8-encoded text in the old database, it will either throw an error or become mojibake if you do this.
Upvotes: 1