MySql not comparing utf-8 strings correctly?

Question

I know it sounds weird, but look at this:

mysql> select * from tbl_list_charset where word='aê';
+------+
| word |
+------+
| aª  | 
+------+

The data is coming from a file with utf-8 strings, which a python program reads and inserts into the table. As word column is defined unique, the insertion of aê fails.

The utf-8 representation of the strings in the file is:

aê = 61 C3 AA
aª = 61 C2 AA

My environment: linux, python 2.6.4, mysql 5.0.77 community edition

I am quite sure it is not a bug, but I am clueless of what I am doing wrong...

goat · Accepted Answer

The collation determines which characters compare as "equal". And yes, there's quite a few of these situations. You can try the utf8_bin collation and you wont have this problem, but it will be case sensitive. The bin collations compare strictly, only seperating the characters out aqccording to the encoding selected, and once that's done, comparisons are done on a binary basis, much like manhy programming languages would compare strings.

If you need something in between this extreme and your current collation, you can make a custom collation. Or, you might be able to get it "good enough" by storing another column, and using a different collation on it, and just each col for specific purposes.

MySql not comparing utf-8 strings correctly?

Answers (2)

Related Questions