André Lucas
André Lucas

Reputation: 1

How to convert old postgresql db from ascii to utf8

I have this old database from postgresql version 8.4 that my boss told me to try to convert from SQL_ASCII encoding to UTF8 encoding. I tried to pg_dump --encoding=UTF8 and I got the message "invalid byte sequence for encoding “UTF8” : 0xe76122" And searching more i tried pg_dump --encoding=ISO88591, and it worked, and I could also import it with no problems to the new empty UTF8 database that i created, but from time to time, i'm getting this message: "ERROR: character 0xc296 with encoding "UTF8" has no equivalent in "WIN1252";". Any solutions?

Upvotes: 0

Views: 766

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 246798

You made a mistake in converting the database.

It must have been encoded in WINDOWS-1252, not ISO 8859-1, and there must have been an “em dash” (Unicode U+2013, code point 96 in WINDOWS-1252).

When you dumped the database with encoding LATIN1 = ISO88591 and loaded it, the byte 0x96 was interpreted as Unicode U+0096, which is 0xC296 in UTF-8. This character does not exist in WINDOWS-1252, so the conversion fails.

You have to dump and restore the database again, but this time use

pg_dump --encoding=WIN1252

Since you also have code point 0x81 in your database, it could aso be WIN1251 (Cyrillic) or WIN1256 (Arabic). Or you have some wild mix – then you must fix the data before migration.

Upvotes: 3

Related Questions