jraede
jraede

Reputation: 6896

Dealing with strange encoding for PHP/MySQL import

We get a daily upload of a CSV file from a client that they say is in UTF16-LE encoding. However, when I run iconv('UTF16-LE', 'UTF8') on each line of the CSV file, it looks like this when going into the database:

Z�A�A�0�7�3�7

IE, there's one of those [?] things in between every character.

I tried utf8_encode and various combinations of iconv and different encoding types in order to get this to go away. Has anyone had any experience with this and how to convert an unknown or unsupported encoding into UTF8, or at least something readable by PHP and MySQL?

Upvotes: 1

Views: 132

Answers (1)

mbarlocker
mbarlocker

Reputation: 1386

Half of the characters in UTF16 cannot be converted into UTF8. UTF16 takes an addition 8 bits.

UTF16 has, encoded into each string, LE or BE. Just for fun, you could try converting from UTF16 to UTF8 (no '-LE'). This would tell you if your client lied to you about LE. But it's most likely the case that the data just doesn't fit.

One solution would be to store it as byte arrays (BINARY(x)) in the database, not as text.

Upvotes: 2

Related Questions