user684202
user684202

Reputation:

How do you convert latin1 to utf8 character encoding?

So, I currently have this problem - I have a sql db dump and the character encoding in it is latin1, but there are some utf8 chars in the file that look like Ä (should be ā) Ä« (should be ī) Å¡ (should be š) Ä“ (should be ē) etc. How do I convert these leters back to the original utf8.?

Character in the file <-> what it should have been <-> bytes

Ä“ <-> ē <-> 5

Ä <-> ā <-> 2

Å¡ <-> š <-> 4

Ä« <-> ī <-> 4

Upvotes: 0

Views: 1763

Answers (2)

jishi
jishi

Reputation: 24604

Encoding should be set on the connection on which you import data and read out data. If both of them are set to UTF-8, you will face no problems.

If you however import them with a latin1 connection, and later on reading it out with a UTF-8, you're in a world of trouble.

PHP internally only handles latin1, however that isn't nessecarily a problem for you.

If you have already wrongly imported the data, you would see a lot of ? or (diamond + ?) on your output I think.

But basically, when connecting frmo PHP, make sure to invoke SET NAMES 'utf8' first thing you do and see if that works.

If data still is wrong, you could use PHPs functions utf8_encode / utf8_decode to convert the data that is problematic.

In a working scenario they should never be used though.

Upvotes: 0

Jon Skeet
Jon Skeet

Reputation: 1499800

If you're seeing multiple bytes for what should be single characters, chances are it's already in UTF-8. Bear in mind that ISO-8859-1 is a single-byte-per-character encoding, whereas UTF-8 can take multiple bytes - and any non-ASCII character does take multiple bytes.

I suggest you open the file in a UTF-8-aware text editor, and check it there.

Upvotes: 2

Related Questions