Nisto
Nisto

Reputation: 195

Special characters won't work in MySQL (UTF-8)

So, I've had some issues while trying to come over from Latin1 encoded databases, tables as well as columns, and now that everything is finally in UTF-8, I can't seem to update a row in a column. I am trying to replace an "e" with an e with acute (é). But it gives me this:

ERROR 1366 (HY000): Incorrect string value: '\x82m ...' for column 'Name' at row 1

when running this:

UPDATE access SET Name='ém' WHERE id="2";

All databases gives me this when running the status command (except the current database part of course):


Connection id:          1  
Current database:       access  
Current user:           root@localhost  
SSL:                    Not in use  
Using delimiter:        ;  
Server version:         5.1.47-community MySQL Community Server (GPL)  
Protocol version:       10  
Connection:             localhost via TCP/IP  
Server characterset:    utf8  
Db     characterset:    utf8  
Client characterset:    utf8  
Conn.  characterset:    utf8  
TCP port:               3306  
Uptime:                 20 min 16 sec  

Threads: 1 Questions: 110 Slow queries: 0 Opens: 18 Flush tables: 1 Open tables: 11  Queries per second avg: 0.90

And running the chcp command in cmd gives me 850. Oh, and at some points I got this:

ERROR 1300 (HY000): Invalid utf8 character string: 'ém' WHERE id="2"

I've looked everywhere for a solution, but I couldn't seem to find anything anywhere, and since I've always had good responses on Stackoverflow, I thought I'd ask here.

Thanks for any help!

Upvotes: 4

Views: 30962

Answers (4)

bob esponja
bob esponja

Reputation: 4139

The solution is to set the connection variables to whatever codepage your installation of windows uses (not latin1 like what a lot of pages out there recommend - cmd.exe's character encoding isn't latin1).

In my case the codepage is 850:

mysql> SET NAMES cp850;

Here's an example with the connection set to UTF-8:

mysql> show variables like '%char%';
+--------------------------+---------------------------------+
| Variable_name            | Value                           |
+--------------------------+---------------------------------+
| character_set_client     | utf8                            |
| character_set_connection | utf8                            |
| character_set_database   | utf8                            |
| character_set_filesystem | binary                          |
| character_set_results    | utf8                            |
| character_set_server     | utf8                            |
| character_set_system     | utf8                            |
| character_sets_dir       | C:\xampp\mysql\share\charsets\  |
+--------------------------+---------------------------------+
8 rows in set (0.00 sec)

This is what happens to accented characters:

mysql> select nom from assignatura where nom like '%prob%';
+---------------------------------------+
| nom                                   |
+---------------------------------------+
| Probabilitat i Processos Estocàstics |
| Probabilitat i Processos Estocàstics |
+---------------------------------------+
2 rows in set (0.03 sec)

Notice the extraneous character just before the á. Also the accent is the wrong direction, it should be à.

After executing SET NAMES cp850;:

mysql> show variables like '%char%';
+--------------------------+--------------------------------+
| Variable_name            | Value                          |
+--------------------------+--------------------------------+
| character_set_client     | cp850                          |
| character_set_connection | cp850                          |
| character_set_database   | utf8                           |
| character_set_filesystem | binary                         |
| character_set_results    | cp850                          |
| character_set_server     | utf8                           |
| character_set_system     | utf8                           |
| character_sets_dir       | C:\xampp\mysql\share\charsets\ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)

We finally get the correct accented character:

mysql> select nom from assignatura where nom like '%prob%';
+--------------------------------------+
| nom                                  |
+--------------------------------------+
| Probabilitat i Processos Estocàstics |
| Probabilitat i Processos Estocàstics |
+--------------------------------------+
2 rows in set (0.00 sec)

Upvotes: 6

D.Shawley
D.Shawley

Reputation: 59633

Well ... 0x82 is e-acute in code page 850. It would be 0xE9 in ISO-8859-1 which makes it something like 0xD0 0xB4 in UTF-8. I don't know if there is a good way to get a DOS window to handle UTF-8 input correctly. Here is an alternative if you are using the command line client. You can set the client character set to match whatever your local code page is and let the mysql library take care of the transcoding for you:

c:\> mysql --default-character-set=cp850
mysql> \s
--------------
mysql  Ver 14.14 Distrib 5.1.34, for apple-darwin9.6.0 (i386) using readline 5.2

Connection id:         17
Current database:
Current user:          daveshawley@localhost
SSL:                   Not in use
Current pager:         stdout
Using outfile:         ''
Using delimiter:       ;
Server version:        5.1.34-log Source distribution
Protocol version:      10
Connection:            localhost via TCP/IP
Server characterset:   ucs2
Db     characterset:   ucs2
Client characterset:   cp850
Conn.  characterset:   cp850
TCP port:              3306
Uptime:                19 days 8 hours 37 min 55 sec

Threads: 2  Questions: 248  Slow queries: 0  Opens: 71  Flush tables: 1  Open tables: 64  Queries per second avg: 0.0
--------------

I know that this works for the combination of latin1 in one window and utf8 in another window on my MacBook. I also verified that an ALTER TABLE ... CONVERT TO CHARACTER SET ucs2 did the right thing.

Upvotes: 0

Will A
Will A

Reputation: 25018

This thread, although somewhat old, seems to result in the conclusion that cmd.exe and the mysql client don't handle UTF-8 encoding properly (with the blame being more aimed at cmd.exe).

Reading in SQL from a file is recommended, as is using an alternative client - or a flavour of UNIX. :)

Upvotes: 3

nos
nos

Reputation: 229342

When you input stuff on the command line, the strings will be in whatever character set the terminal uses. Why the mysql client doesn't translate that before sending it to the db still puzzles me, but it doesn't. You're probably sending latin1 to the db.

You could save your update SQL in a text file, make sure that text file is UTF-8, and run something like type myfile.txt | mysql db_name

Upvotes: 0

Related Questions