Reputation: 2772
I'm currently developing a website that is going to show stuff for almost any language in the world. And I'm having problems choosing the best collation to define in the MySQL.
Which one is the best to support all characters? Or the most accurate?
Or is just best to convert all characters to unicode?
Upvotes: 26
Views: 15880
Reputation: 1
utf8mb4: A UTF-8 encoding of the Unicode character set using one to four bytes per character.
utf8mb3: A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead.
So prefer to use utf8mb4
Upvotes: 0
Reputation: 2967
Use utf8mb4_unicode_ci
or utf8mb4_general_ci can be tricky and cause unexpected behaviors.
Be aware.
Perhaps utf8mb4_unicode_bin
can be a good option if you want to avoid cases like this one below.
Upvotes: 0
Reputation: 413
Use utf8mb4 instead of utf8
utf8mb4_general_ci => support 1, 2, 3 or 4 bytes
and
utf8_general_ci or utf8mb3_general_ci => support 1, 2 or 3 bytes
It will take space on ur disk as required.
Upvotes: 0
Reputation: 2634
The accepted answer is wrong (maybe it was right in 2009).
utf8mb4_unicode_ci
is the best encoding to use for wide language support.
Reasoning and supporting evidence:
You want to use
utf8mb4
rather thanutf8
because the latter only supports 3 byte characters, and you want to support 4 byte characters. (ref)
and
You want to use
unicode
rather thangeneral
because the latter never sorted correctly. (ref)
Upvotes: 36
Reputation: 2202
I generally use 8-bit UCS/Unicode transformation format which works perfect for any (well most) languages
utf8_general_ci
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html
Upvotes: 23