Reputation: 11430
I am working on a website with MySQL database on a Linux server.
Using phpMyAdmin, on the database, it says
However, I have created all the tables with InnoDB and utf8_unicode_ci. I also checked that the table fields for all tables is utf8_unicode_ci.
Yet, when I mysql_fetch_array, and echo to stream, it gives gibberish. I had to explicitly set mysql_set_charset('utf8')
for the text to appear correctly.
PHP version is 5.3.9; MySQL version is 5.1.70-cll - MySQL Community Server (GPL).
This is the first time I encountered this problem and I never had to set charset before.
What caused the text fetched by php mysql_* to be gibberish? Under what circumstance is it necessary to mysql_set_charset
?
EDIT: This is not a question to attract suggestion to use alternative library e.g. mysqli, pdo. I just want to understand about this current situtation about the behavior of MySQL and charsets. Thanks.
Upvotes: 1
Views: 3848
Reputation: 522085
When exchanging data between two systems, there's always the question "what encoding will text be sent in?" "Text" is represented simply as binary data, just long strings of 1s and 0s. These could mean anything at all. There are hundreds of encoding schemes to encode different characters into different sequences of 1 and 0. If a system just receives a string of those without being told what encoding they represent, the system cannot know what characters those supposedly are.
Therefore, for any interface between two system, there needs to be a specification for what encoding strings are in. For MySQL, that's the API call mysql_set_charset
. This is the way to tell MySQL what encoding strings will be in that PHP sends to it, and what encoding MySQL should returns strings in back to PHP. Without setting this explicitly some default encoding is assumed, which may not be the same encoding you're expecting, creating a mismatch and garbage characters.
Read What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text and Handling Unicode Front To Back In A Web App for more information.
Upvotes: 3
Reputation: 850
mysql_set_charset
functions sets the default character set for the current connection. Even though your data is stored in unicode on the server, it still requires a compatible connection character set to transmit data accurately.
If you execute SHOW VARIABLES LIKE 'character\_set\_%'
statement in mysql it will show various sharacter sets used by the server and current connection. Ideally they should all match and be utf8
.
More information: MySQL Connection Character Sets.
Upvotes: 1
Reputation: 75629
It's wise to always call it once connection is established, to ensure your app will not be affected by broken server settings. Because you can have your tables in i.e. UTF8 and send your data in UTF8 but if the connection is not UTF8 (because of i.e my.ini
settings) then you end up with mess. So either call mysql_set_charset()
or execute SET NAMES charset
query, and you will be on safe ground. And since it is done once per connection, it's basically no cost operation anyway
Upvotes: 2