Reputation: 195
So, I've had some issues while trying to come over from Latin1
encoded databases, tables as well as columns, and now that everything is finally in UTF-8, I can't seem to update a row in a column. I am trying to replace an "e" with an e with acute (é). But it gives me this:
ERROR 1366 (HY000): Incorrect string value: '\x82m ...' for column 'Name' at row 1
when running this:
UPDATE access SET Name='ém' WHERE id="2";
All databases gives me this when running the status command (except the current database
part of course):
Connection id: 1
Current database: access
Current user: root@localhost
SSL: Not in use
Using delimiter: ;
Server version: 5.1.47-community MySQL Community Server (GPL)
Protocol version: 10
Connection: localhost via TCP/IP
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
TCP port: 3306
Uptime: 20 min 16 sec
Threads: 1 Questions: 110 Slow queries: 0 Opens: 18 Flush tables: 1 Open tables: 11 Queries per second avg: 0.90
And running the chcp command in cmd gives me 850
. Oh, and at some points I got this:
ERROR 1300 (HY000): Invalid utf8 character string: 'ém' WHERE id="2"
I've looked everywhere for a solution, but I couldn't seem to find anything anywhere, and since I've always had good responses on Stackoverflow, I thought I'd ask here.
Thanks for any help!
Upvotes: 4
Views: 30962
Reputation: 4139
The solution is to set the connection variables to whatever codepage your installation of windows uses (not latin1 like what a lot of pages out there recommend - cmd.exe's character encoding isn't latin1).
In my case the codepage is 850:
mysql> SET NAMES cp850;
Here's an example with the connection set to UTF-8:
mysql> show variables like '%char%';
+--------------------------+---------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\xampp\mysql\share\charsets\ |
+--------------------------+---------------------------------+
8 rows in set (0.00 sec)
This is what happens to accented characters:
mysql> select nom from assignatura where nom like '%prob%';
+---------------------------------------+
| nom |
+---------------------------------------+
| Probabilitat i Processos Estocàstics |
| Probabilitat i Processos Estocàstics |
+---------------------------------------+
2 rows in set (0.03 sec)
Notice the extraneous ├
character just before the á
. Also the accent is the wrong direction, it should be à
.
After executing SET NAMES cp850;
:
mysql> show variables like '%char%';
+--------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------+
| character_set_client | cp850 |
| character_set_connection | cp850 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | cp850 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\xampp\mysql\share\charsets\ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)
We finally get the correct accented character:
mysql> select nom from assignatura where nom like '%prob%';
+--------------------------------------+
| nom |
+--------------------------------------+
| Probabilitat i Processos Estocàstics |
| Probabilitat i Processos Estocàstics |
+--------------------------------------+
2 rows in set (0.00 sec)
Upvotes: 6
Reputation: 59633
Well ... 0x82 is e-acute in code page 850. It would be 0xE9 in ISO-8859-1 which makes it something like 0xD0 0xB4 in UTF-8. I don't know if there is a good way to get a DOS window to handle UTF-8 input correctly. Here is an alternative if you are using the command line client. You can set the client character set to match whatever your local code page is and let the mysql library take care of the transcoding for you:
c:\> mysql --default-character-set=cp850
mysql> \s
--------------
mysql Ver 14.14 Distrib 5.1.34, for apple-darwin9.6.0 (i386) using readline 5.2
Connection id: 17
Current database:
Current user: daveshawley@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.1.34-log Source distribution
Protocol version: 10
Connection: localhost via TCP/IP
Server characterset: ucs2
Db characterset: ucs2
Client characterset: cp850
Conn. characterset: cp850
TCP port: 3306
Uptime: 19 days 8 hours 37 min 55 sec
Threads: 2 Questions: 248 Slow queries: 0 Opens: 71 Flush tables: 1 Open tables: 64 Queries per second avg: 0.0
--------------
I know that this works for the combination of latin1
in one window and utf8
in another window on my MacBook. I also verified that an ALTER TABLE ... CONVERT TO CHARACTER SET ucs2
did the right thing.
Upvotes: 0
Reputation: 25018
This thread, although somewhat old, seems to result in the conclusion that cmd.exe and the mysql client don't handle UTF-8 encoding properly (with the blame being more aimed at cmd.exe).
Reading in SQL from a file is recommended, as is using an alternative client - or a flavour of UNIX. :)
Upvotes: 3
Reputation: 229342
When you input stuff on the command line, the strings will be in whatever character set the terminal uses. Why the mysql client doesn't translate that before sending it to the db still puzzles me, but it doesn't. You're probably sending latin1 to the db.
You could save your update SQL in a text file, make sure that text file is UTF-8, and run
something like type myfile.txt | mysql db_name
Upvotes: 0