Chris Muench
Chris Muench

Reputation: 18318

Converting mysql database to support multiple languages

I currently have a database and most of the tables and fields are stored as latin1_swedish_ci.

My questions are:

  1. Should I use utf8_unicode_ci for the tables and their fields?
  2. What is the difference between a collation for a table vs a field?
  3. Can I mass change all fields/tables to utf8_unicode_ci? (if thats the right collation)
  4. Can any data loss happen in the fields/tables?

The reason I want to use utf8 is because I need to store Chinese and/or Russian chars in the database.

Upvotes: 3

Views: 6816

Answers (1)

Wrikken
Wrikken

Reputation: 70490

  1. I'd go for it. There are very few reasons not to go with utf8 when working with multiple languages.
  2. The table character set and collation are used as default values for column definitions if the column character set and collation are not specified in individual column definitions.
  3. If you don't have individual collations/character sets on fields, you can write a simple scripts that loops through all tables and issues ALTER TABLE tablename CONVERT TO CHARSET utf8;. If your fields have custom character sets/collations, you'll have to issue a MODIFY COLUMN statement. You can easily identify those querying the information_schema.columns table after your conversion.
  4. Yes, if data cannot be represented in utf8 (which is unlikely given your current collation), data can break. Also, if you stored utf8 text in columns which weren't actually utf8, converting those fields results in something like a double utf8-encoding, which is a pretty inconvenient mess to clean up. If you know a column to be wrongly used to story utf8, you can convert it to bin & then to utf8 to prevent conversion/double encoding.

Upvotes: 4

Related Questions