Reputation: 9590
This stumps me. I'm upgrading a fairly large app (for me) from Rails 2.3 to Rails 3.0. I'm also running this app in Ruby 1.9.2 as opposed to 1.8.7 before. On top of that I've also switched to HTML5. There are therefore many variables in play.
In several pages, the text coming from the MySQL database just does not display right anymore. This can be as simple as the euro symbol (€) or as esoteric as some Sanskrit text: सर्वम् मंगलम्
While everything looked great on the old site now I get some garbage characters such as € instead of the euro sign or the following:
सर्वम् मंगलम्
... instead of the sanskrit text.
The data in the database is unchanged. As far as I know everything is set up for utf-8 everywhere.
What gives?
Edit 1 following up Roland's help:
Here is what I get on my ubuntu server's MySQL databases:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
but here is what I get from running the command on my local mac:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/Cellar/mysql/5.5.14/share/mysql/charsets/ |
+--------------------------+------------------------------------------------------+
The second listing looks better to me (who doesn't understand encoding very much).
Should I modify my server databases' settings? Won't that mess up their existing data? If so how do I go about changing the char. set variables?
Upvotes: 2
Views: 1180
Reputation: 41617
When you interpret the given string as Unicode, save it as UTF-8 to a byte stream and then convert the byte stream to MacRoman, you will get the right bytes. These are the UTF-8 encoded string.
I did this (in a UTF-8 terminal):
$ echo 'सर्वम् मंगलम्' > in
$ iconv -f UTF-8 -t MacRoman < in
सर्वम् मंगलम्
So somewhere, the opposite conversion is done to the data. The byte stream is interpreted as being in MacRoman, and it is then converted to UTF-8 again.
Upvotes: 5