davka
davka

Reputation: 14702

MySql not comparing utf-8 strings correctly?

I know it sounds weird, but look at this:

mysql> select * from tbl_list_charset where word='aê';
+------+
| word |
+------+
| aª  | 
+------+

The data is coming from a file with utf-8 strings, which a python program reads and inserts into the table. As word column is defined unique, the insertion of fails.

The utf-8 representation of the strings in the file is:

aê = 61 C3 AA
aª = 61 C2 AA

My environment: linux, python 2.6.4, mysql 5.0.77 community edition

I am quite sure it is not a bug, but I am clueless of what I am doing wrong...

Upvotes: 2

Views: 2402

Answers (2)

goat
goat

Reputation: 31854

The collation determines which characters compare as "equal". And yes, there's quite a few of these situations. You can try the utf8_bin collation and you wont have this problem, but it will be case sensitive. The bin collations compare strictly, only seperating the characters out aqccording to the encoding selected, and once that's done, comparisons are done on a binary basis, much like manhy programming languages would compare strings.

If you need something in between this extreme and your current collation, you can make a custom collation. Or, you might be able to get it "good enough" by storing another column, and using a different collation on it, and just each col for specific purposes.

Upvotes: 2

Arc
Arc

Reputation: 11306

Do you also use UTF-8 with the mysql client program as well as in your Python application ?
I.e. call mysql --default-character-set=utf8 and in Python issue at least one SET NAMES='utf8' before doing any other queries ?

Upvotes: 1

Related Questions