Reputation: 687
I have a MySQL database which is as far as I can tell utf8 through and through. It's a database originally used solely by a Ruby on Rails application which has no issue writing and reading the utf8 characters.
However, when attempting to read via a c# app using Dapper I'm seeing a lot of bad characters, for example:
Expected: FELIZ AÑO
Actual: FELIZ AÑO
My connection string looks like this:
Server=;Database=;Uid=;Pwd=;Port=;SslMode=;charset=utf8;
I've tried several combinations of charset and utf8 capitalisation but none have worked so far. I read in one thread that the charset in the connection string only affects the SQL query language so if that's correct that'll be why its not helping!
Is there anything else I'm missing potentially? or anything else I can do to be able to read the characters correctly?
Update: Some info from MySQL:
mysql> SELECT TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME, COLUMN_TYPE
-> FROM information_schema.columns
-> WHERE TABLE_NAME = 'pages'
-> AND COLUMN_NAME = 'title';
| TABLE_NAME | COLUMN_NAME | CHARACTER_SET_NAME | COLLATION_NAME | COLUMN_TYPE |
| pages | title | utf8 | utf8_general_ci | varchar(255) |
Update 2: More info, it seems my character sets are a bit messed up..
mysql> show variables like "character_set_%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
mysql> select collation_name from information_schema.columns where table_name = 'pages' and column_name = 'title';
+-----------------+
| collation_name |
+-----------------+
| utf8_general_ci |
+-----------------+
mysql> select title from pages where id = 3660;
+--------------------------------+
| title |
+--------------------------------+
| FELIZ AÑO |
+--------------------------------+
mysql> set character set 'latin1';
mysql> select title from pages where id = 3660;
+-----------------------------+
| title |
+-----------------------------+
| FELIZ AÑO |
+-----------------------------+
So, database is latin1, default connection is utf8, default client is utf8, column is utf8.
If I query the column directly in mysql, it comes out wrong until I set the character set to latin1.
Doing the exact same thing in C# with Execute to set the character set and ExecuteReader to read the column is still producing the bad characters. Trying to work out the difference.
Update 3 - Screenshot for @BradleyGrainger, this is from SequelPro
Upvotes: 1
Views: 2035
Reputation: 142298
OK, it's just Mojibake.
HEX: 46 45 4C 49 5A 20 41 C383 E28098 4F
F E L I Z (sp) A Ã ‘ O
Mojibaked: FELIZ AÑO
Should be: FELIZ AÑO
The instructions for preventing it is in "Mojibake" of Trouble with UTF-8 characters; what I see is not what I stored
After preventing it, one way to "fix" the data would involve an UPDATE
to change the column contents thus:
col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4)
You can see that via:
SELECT CONVERT(BINARY(CONVERT('FELIZ AÑO' USING latin1)) USING utf8mb4); -- FELIZ AÑO
(Please experiment in a test environment, not production.)
Upvotes: 2