Reputation: 8107
I have a some record in MySQL such as
Váºn hà nh linh hoạt trong má»i Ä‘k giao thông
which in hex as
56 c3 a1 c2 ba c2 ad 6e 20 68 c3 83 c2 a0 6e 68 20 6c 69 6e 68 20 68 6f c3 a1 c2 ba c2 a1 74 20 74 72 6f 6e 67 20 6d c3 a1 c2 bb c2 8d 69 20 c3 84 e2 80 98 6b 20 67 69 61 6f 20 74 68 c3 83 c2 b4 6e 67 20
I dont know how PHP save it, but read it from Java MySQL Connector show some strange character. And I can make it show the origin text by
copy the text above --> Notepad++ - Encoding in ASCII --> Paste text
--> Encoding in UTF-8
the original text should be:
Vận hành linh hoạt trong mọi đk giao thông
I know the problem is PHP save incorrect text format, but is there a way to decode it correctly in Java?
Upvotes: 0
Views: 352
Reputation: 11114
Are you sure the hex is exactly correct? Here is what I did...
String MESS = "56 c3 a1 c2 ba c2 ad 6e 20 68 c3 83 c2 a0 6e 68 20 6c 69 6e 68 20 68 6f c3 a1 c2 ba c2 a1 74 20 74 72 6f 6e 67 20 6d c3 a1 c2 bb c2 8d 69 20 c3 84 e2 80 98 6b 20 67 69 61 6f 20 74 68 c3 83 c2 b4 6e 67 20";
String[] hexchars = MESS.split(" ");
byte[] buf = new byte[hexchars.length];
for (int i = 0; i < hexchars.length; i++) {
buf[i] = (byte) Integer.parseInt(hexchars[i], 16);
}
try {
String s1 = new String(buf, "UTF-8"); // First encode UTF-8
buf = s1.getBytes("cp1252"); // ...then translate to cp1252
s1 = new String(buf, "UTF-8"); // ...then back to UTF-8
System.out.println(s1);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
And the printed result is:
Vận hành linh hoạt trong m�?i đk giao thông
Which is almost right. Except the decoding of mọi it is incorrect, which makes me suspect the hex that you provided may not be correct. If you are 100% sure it is correct, I can try a little more to decode it.
UPDATE: Here are my further thoughts:
Only then will there be a possibility of setting the MySQL Connector/J to the right encoding, and then possibly applying a second conversion in Java.
Upvotes: 1