LTM
LTM

Reputation: 557

Mysql changing the collation from latin1_swedish_ci to utf8mb4_bin

One of my columns has special characters like "Ψ" which latin1_swedish_ci does not seem to support. To handle such cases I tried switching the collation to utf8mb4_bin for the one column. But when converting the data i get:

Query error:
#1366 - Incorrect string value: '\xE4' for column 'title' at row 18

Note: I'm using phpmyadmin to change it. Why am i getting this error? I thought utf8mb4_bin could support it?

Upvotes: 1

Views: 3293

Answers (1)

Rick James
Rick James

Reputation: 142208

You are in for a rocky road. Hex E4 has nothing to do with Psi. Where did it come from?

Do SELECT title, HEX(title) ... on some title that has a non-Ascii character in it.

The UTF-8 encoding for psi is two hex characters CEA8.

E4, when interpreted as latin1 represents ä. Does that make any sense?

  • The column needs to be changed to CHARACTER SET utf8 (or utf8mb4). But there is a right way to do that and a way that messes things up worse. Let's see that HEX before discussing which is right.
  • The VARIABLEs that control what encoding the client is using. There are multiple settings that need to change when switching encodings.

More references:

No charset maps E4 to Psi:

cp1250, cp1257, dec8, latin1, latin2, latin5, latin7  25 24
                                                     'Kaze no Tani no Nausicaä'
                                        cp852  25 24 'Kaze no Tani no Nausicań'
                                        cp850  25 24 'Kaze no Tani no Nausicaõ'
                                        macce  25 24 'Kaze no Tani no Nausicaš'
                                          hp8  25 24 'Kaze no Tani no Nausicað'
                                        greek  25 24 'Kaze no Tani no Nausicaδ'
                                      keybcs2  25 24 'Kaze no Tani no NausicaΣ'
                                       cp1251  25 24 'Kaze no Tani no Nausicaд'
                                 koi8r, koi8u  25 24 'Kaze no Tani no NausicaД'
                                        cp866  25 24 'Kaze no Tani no Nausicaф'
                                     armscii8  25 24 'Kaze no Tani no NausicaՊ'
                                       hebrew  25 24 'Kaze no Tani no Nausicaה'
                                       cp1256  25 24 'Kaze no Tani no Nausicaن'
                                       tis620  26 24 'Kaze no Tani no Nausicaไ'
                                      geostd8  26 24 'Kaze no Tani no Nausicaჰ'
                                     macroman  26 24 'Kaze no Tani no Nausica‰'

So, I worry that two mistakes have been made. Do you have another example of mangled text?

Latin1 cannot handle Greek letters. Nor Cyrillic. Nor Chinese. Etc. So, ä is "correct"? (I have been chasing how to get between E4 and Psi.)

So you should probably convert to utf8mb4.

ALTER TABLE t CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;

will convert all the text columns in table t.

To change just one column:

ALTER TABLE t MODIFY COLUMN c VARCHAR(...)
     CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;

It must contain all the other stuff you have not (eg, NULL or NOT NULL).

Upvotes: 4

Related Questions