MySQL is too smart about accented characters

Question

I guess, normally people would be aiming to make their programme behave like this, but in my case this is completely opposite from what I want.

Somehow, my MySQL database is able to read different accented characters as identical. For instance, shī, shí, shǐ, shì and shi are all the same thing to it. When I search for one, I’ll get the others as well. Proofpic:

smart SQL

This is not what I want, since for me those values are very different. Basically, the query on the pic must return empty rows, because there is no a single entry in that table with shi (without an accent).

My tables type is InnoDB, collation is utf8_general_ci.

Mchl · Accepted Answer

Use utf8_bin collation. You don't have to change collation of entire column, you can just use it on per query basis

WHERE `pinyin` = 'shi' COLLATE utf8_bin

You can also experiment with different collations which might work better for you (utf8_bin works on binery level, so even if two unicode characters with different byte codes are the same, it will see them as different).

MySQL is too smart about accented characters

Answers (1)

Related Questions