Reputation: 14702
I know it sounds weird, but look at this:
mysql> select * from tbl_list_charset where word='aê';
+------+
| word |
+------+
| aª |
+------+
The data is coming from a file with utf-8 strings, which a python program reads and inserts into the table. As word
column is defined unique, the insertion of aê
fails.
The utf-8 representation of the strings in the file is:
aê = 61 C3 AA
aª = 61 C2 AA
My environment: linux, python 2.6.4, mysql 5.0.77 community edition
I am quite sure it is not a bug, but I am clueless of what I am doing wrong...
Upvotes: 2
Views: 2402
Reputation: 31854
The collation determines which characters compare as "equal". And yes, there's quite a few of these situations. You can try the utf8_bin collation and you wont have this problem, but it will be case sensitive. The bin collations compare strictly, only seperating the characters out aqccording to the encoding selected, and once that's done, comparisons are done on a binary basis, much like manhy programming languages would compare strings.
If you need something in between this extreme and your current collation, you can make a custom collation. Or, you might be able to get it "good enough" by storing another column, and using a different collation on it, and just each col for specific purposes.
Upvotes: 2
Reputation: 11306
Do you also use UTF-8 with the mysql
client program as well as in your Python application ?
I.e. call mysql --default-character-set=utf8
and in Python issue at least one SET NAMES='utf8'
before doing any other queries ?
Upvotes: 1