Bobby
Bobby

Reputation: 113

mySQL Character Sets

I noticed today that our database uses character set "utf8 -- UTF-8 Unicode" and collation "utf8_general_ci" but most of the tables and columns inside are using CHARSET=latin1. Will I run into any problems with this?

The reason I ask is because we have been running into a lot of problems syncing data between two database.

Upvotes: 1

Views: 643

Answers (3)

Isotopp
Isotopp

Reputation: 3383

For an overview of MySQL character sets, read for example http://mysqldump.azundris.com/archives/60-Handling-character-sets.html

The server, a schema/database and a table have no character sets, they have just defaults that are inherited downwards (server to schema to table). Columns that are of a CHAR, VARCHAR or any TEXT type have character sets, and do so on a per column basis. If no specific character set is defined for them, they inherit from the table.

Inheritance for all these objects happens at object creation time.

The other thing that has a character set is the connection. Since the connection is the collection of things the server knows about the client, the character set of the connection should be set to whatever character set you are using in your client.

MySQL will then correctly convert between the character set of a column and the character set of a connection. Usually there are no problems with that.

The most common problem PEOPLE have with it is lying to the server, that is, setting the character set of a connection to something different from what the client is actually sending or using. This can be done at runtime by sending the command SET NAMES ... as the first thing at connection setup, and it is very important that you specify the correct thing here.

If you do, and for example send latin1 data into a connection that has been SET NAMES latin1, storing data into a latin1 column will not convert data, whereas storing data into a utf8 column will convert your latin1 umlauts (ö = F6) into utf8 umlauts (ö = C3 B6) on disk. Reading will transparently convert back, if the connection is properly set up.

In your setup, if your connection is SET NAMES utf8 and you are sending data to a latin1 column, only data that can be represented in latin1 can be stored. There will be data truncation, and a data truncation warning if you for example try to store japanese hiragana in such a latin1 column.

Upvotes: 2

Jon
Jon

Reputation: 437376

You will run into problems if there's a possibility of storing "international" text -- that is, non-latin characters.

If I understand what you 're posting correctly, this means that the default for new tables in your database is UTF-8, but your existing tables use latin-1. That could be a problem. Depends on your data, as mentioned above.

Upvotes: 0

Jan Zyka
Jan Zyka

Reputation: 17898

My experience with messign up MySQL charset was not 100% functional sorting of strings. You would be better with having everything in UTF-8 to be on the safe side.

I think it depends on what you actually store in that columns. If you store UTF-8 multi-byte characters in a column with latin-1 charset you might run into the sorting troubles. But as longs as there are only EN/US characters you should be ok.

Upvotes: 0

Related Questions