kevlaria
kevlaria

Reputation: 33

MYSQL: Inserting Traditional & Simplified Chinese in the same 'cell‘

newbie here!

I have source data that contains both simplified and traditional Chinese in the same 'cell' (sorry, newbie using Excel speak here!), which I'm trying to load into MYSQL using "Load Data Infile".

The offending text is "到达广州新冶酒吧!一杯芝華士 嘈雜的音樂 行行色色的男女". It's got both simplified Chinese ("广") and traditional Chinese ("華").

When I load it into MySQL, I get the following error:

Error Code: 1366. Incorrect string value: '\xF0\xA3\x8E\xB4\xE8\x83...' for column > 'Description' at row 2

The collation of the database is UTF-8 default collation, and the input file is also UTF-8 encoded.

Is there any way I can either:

a) Make SQL accept this row of data (ideal), or b) Get SQL to skip inserting this line of data?

Thanks! Do let me know if you need further detail.

Kevin

Upvotes: 2

Views: 1954

Answers (1)

prosfilaes
prosfilaes

Reputation: 1308

If 😼 was tripping it up, that's because 😼 is not in the Basic Multilingual Plane of Unicode; it's in the Supplementary Multilingual Plane, which is above U+FFFF and takes up 4 bytes in UTF-8 instead of 3. Fully conformant Unicode implementations treat them no differently, but MySQL charset utf8 doesn't accept characters above U+FFFF. If you have a recent version of MySQL, you can ALTER TABLE to use utf8mb4 which properly handles all Unicode characters. There are some catches to changing, as MySQL allocates 4 bytes per character instead of 3; see http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html for the details.

This issue is a duplicate of Inserting UTF-8 encoded string into UTF-8 encoded mysql table fails with "Incorrect string value" .

Upvotes: 3

Related Questions