Reputation: 924
I'm creating a website that'll store tutorial videos in several different languages. English will be the primary audience, but I expect french accents to be used in usernames/pwds along with swedish/norwegian accents/characters as well.
The languages for the tutorial videos will also be offered in chinese (both cantonese/mandarin), urdu/hindi, farsi/dari, and arabic. While I'm pretty sure the last few use standard qwerty keyboards for the net, especially to register online with - I do know that european keyboards vary and have several accents and ligatures to them.
I was wondering as far as mysql is concerned in terms of storing usernames and email addresses, which collation type would be best suited to support the most probable entries? I know I probably cannot cover them all, but I'd like to do as much as possible.
I've read that uft8_general_ci is better, but how would it vary from latin_1 swedish_ci if I'm looking to support those scandanavian characters?
EDIT: the user_id field and email fields will be unique - so [email protected] would not be the same as fré[email protected]
Upvotes: 2
Views: 2125
Reputation: 521995
The collation is pretty irrelevant here for storing data. It only specifies rules for comparison and sorting. What you need is the right charset, which should be utf8
. If your MySQL version is >= 5.5, you should even use utf8mb4
or utf16
, both of which cover the entirety of Unicode (MySQL's utf8
is a limited subset of real UTF-8, covering only the BMP). A latin1
charset limits you to the 256 characters defined in it.
If you want to avoid similar entries to be seen as the same thing, use the appropriate _bin
collation.
Upvotes: 1
Reputation: 29381
I wouldn't use utf8_general_ci
, and use utf8_unicode_ci
instead. It has much better support for sorting and comparisons, you can derive down utf8_unicode_ci
to multiple other collation types - for example utf8_swedish_ci
to get the correct swedish sorting and comparison.
The con is that it's somewhat slower than utf8_general_ci
, but IMO you gain so much more.
Upvotes: 0