Pedro Luz
Pedro Luz

Reputation: 2772

MySQL collation for all languages

I'm currently developing a website that is going to show stuff for almost any language in the world. And I'm having problems choosing the best collation to define in the MySQL.

Which one is the best to support all characters? Or the most accurate?

Or is just best to convert all characters to unicode?

Upvotes: 26

Views: 15880

Answers (5)

Suresh
Suresh

Reputation: 1

From mysql web site :

utf8mb4: A UTF-8 encoding of the Unicode character set using one to four bytes per character.

utf8mb3: A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead.

utf8: An alias for utf8mb3. In MySQL 8.0, this alias is deprecated; use utf8mb4 instead. utf8 is expected in a future release to become an alias for utf8mb4.

So prefer to use utf8mb4

Upvotes: 0

FabianoLothor
FabianoLothor

Reputation: 2967

Use utf8mb4_unicode_ci or utf8mb4_general_ci can be tricky and cause unexpected behaviors.

Be aware.

Perhaps utf8mb4_unicode_bin can be a good option if you want to avoid cases like this one below.

enter image description here

Upvotes: 0

Deepak Kumar
Deepak Kumar

Reputation: 413

Use utf8mb4 instead of utf8

utf8mb4_general_ci => support 1, 2, 3 or 4 bytes

and

utf8_general_ci or utf8mb3_general_ci => support 1, 2 or 3 bytes

It will take space on ur disk as required.

Upvotes: 0

Gerbus
Gerbus

Reputation: 2634

The accepted answer is wrong (maybe it was right in 2009).

utf8mb4_unicode_ci is the best encoding to use for wide language support.

Reasoning and supporting evidence:

You want to use utf8mb4 rather than utf8 because the latter only supports 3 byte characters, and you want to support 4 byte characters. (ref)

and

You want to use unicode rather than general because the latter never sorted correctly. (ref)

Upvotes: 36

stone
stone

Reputation: 2202

I generally use 8-bit UCS/Unicode transformation format which works perfect for any (well most) languages

utf8_general_ci

http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html

Upvotes: 23

Related Questions