Geln Yang
Geln Yang

Reputation: 912

Can't get correct value to query Chinese value from latin1 field Using JDBC for MySQL

It can't get correct value to query Chinese value from latin1 field (varchar or char) Using JDBC for MySQL. The character can't be changed. The following is the test steps. Is it possible to get the correct value for this situation? What's more should I do?

1. create database:

CREATE DATABASE TESTDB CHARACTER SET latin1 COLLATE latin1_general_ci;  

2. create table:

CREATE TABLE TB1 (
    vname varchar(50) default '',
    cname char(50) default ''
) DEFAULT CHARSET=latin1 ;

3. insert data:

set names latin1;  
insert into TB1(vname,cname) values('名字v','名字c');  

4. show character set:

mysql> show variables like '%set%';  
+--------------------------+---------------------------------------------------------+  
| Variable_name            | Value                                                   |  
+--------------------------+---------------------------------------------------------+  
| auto_increment_offset    | 1                                                       |  
| character_set_client     | latin1                                                  |  
| character_set_connection | latin1                                                  |  
| character_set_database   | latin1                                                  |  
| character_set_filesystem | binary                                                  |  
| character_set_results    | latin1                                                  |  
| character_set_server     | utf8                                                    |  
| character_set_system     | utf8                                                    |  
| character_sets_dir       | F:\Program Files\MySQL\MySQL Server 5.5\share\charsets\ |  
+--------------------------+---------------------------------------------------------+  

5. it can get correct value when querying from command console:

mysql> select * from tb1;  
+-------+-------+  
| vname | cname |  
+-------+-------+  
| 名字v     | 名字c     |  
+-------+-------+  
1 row in set (0.00 sec)  

6. can't get the correct valuing when using JDBC to query:
jdbc url : jdbc:mysql://192.168.5.74/testdb?characterEncoding=UTF-8

*set names utf8;  
select vname,hex(vname),length(vname),char_length(vname) from tb1;  
select cname,hex(cname),length(cname),char_length(cname) from tb1;  
select vname,cname  
    ,CONVERT(CONVERT(CONVERT(vname USING latin1) USING binary) USING utf8) as c1  
    ,CONVERT(CONVERT(CONVERT(cname USING latin1) USING binary) USING utf8) as c2  
from tb1;*  

vname           hex(vname)     length(vname) char_length(vname)   
--------------- -------------- ------------- ------------------   
??×?v           C3FBD7D676     5             5                    

cname           hex(cname)     length(cname) char_length(cname)   
--------------- -------------- ------------- ------------------   
??×?c           C3FBD7D663     5             5                    

vname             cname     c1      c2      
----------------- --------- ------- ------  
??×?v             ??×?c                 

Upvotes: 2

Views: 3276

Answers (3)

chooban
chooban

Reputation: 9256

Another, simpler way of getting the desired characters would be to get the bytes from the column and then convert those into a string inside the application.

Pseudocode...

byte[] rawBytes = resultSet.getBytes( "vname" );
String vname = new String( rawBytes, "gb2312" );

A very useful way to check and see if mysql is showing you the correct string in the monitor by accident or design is to use the length functions.

select vname, length( vname ) as bytelength, char_length( vname ) as stringlength from tb1;

In addition, the hex() function is also your friend:

select vname, hex( vname ) from tb1;

That'll show you the hex values of the bytes that are stored in the column so you can then look those up (yay, Wikipedia!) to see if they match the displayed value in the various different ways of viewing it.

Upvotes: 0

Geln Yang
Geln Yang

Reputation: 912

I resolve it. The multi-bytes characters are converted to bytes when storing in latin1 field. It need to convert back to the inserting charset. The following sql can do this:

CONVERT(CONVERT(CONVERT(vname USING latin1) USING binary) USING [INSERT_CHARSET]) 

My inserting charset is gb2312, so the sql should be :

CONVERT(CONVERT(CONVERT(vname USING latin1) USING binary) USING gb2312) 

Upvotes: 4

hungneox
hungneox

Reputation: 9829

You should change latin1_general_ci; to utf8_general_ci or utf16_general_ci. The problem is CHARSET=latin1 cannot store unicode characters like Chinese characters.

Check out this: http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-utf8.html

The idea of UTF-8 is that various Unicode characters are encoded using byte sequences of different lengths:

Basic Latin letters, digits, and punctuation signs use one byte.

Most European and Middle East script letters fit into a two-byte sequence: 

extended Latin letters (with tilde, macron, acute, grave and other accents), Cyrillic, Greek, Armenian, Hebrew, Arabic, Syriac, and others.

**Korean, Chinese, and Japanese ideographs use three-byte sequences.**

Latin1 charset is only 8bit single byte while Chinese script require multi-bytes

Upvotes: 2

Related Questions