Reputation: 857
When I run the following program:
public static void main(String args[]) throws Exception
{
byte str[] = {(byte)0xEC, (byte)0x96, (byte)0xB4};
String s = new String(str, "UTF-8");
}
on Linux and inspect the value of s in jdb, I correctly get:
s = "ì–´"
on Windows, I incorrectly get:
s = "?"
My byte sequence is a valid UTF-8 character in Korean, why would it be producing two very different results?
Upvotes: 2
Views: 4463
Reputation: 1744
JDB is displaying the data incorrectly. The code works the same on both Windows and Linux. Try running this more definitive test:
public static void main(String[] args) throws Exception {
byte str[] = {(byte)0xEC, (byte)0x96, (byte)0xB4};
String s = new String(str, "UTF-8");
for(int i=0; i<s.length(); i++) {
System.out.println(BigInteger.valueOf((int)s.charAt(i)).toString(16));
}
}
This prints out the hex value of every character in the string. This will correctly print out "c5b4" in both Windows and Linux.
Upvotes: 0
Reputation: 340933
It correctly prints "어
" on my computer (Ubuntu Linux), as described in Code Table Korean Hangul. Windows command prompt is known to have issues with encoding, don't bother.
Your code is fine.
Upvotes: 3
Reputation: 726987
You get the correct string, it's Windows console that does not display the string correctly.
Here is a link to an article that discusses a way to make Java console produce correct Unicode output using JNI.
Upvotes: 1
Reputation: 597342
It gives 어
for me. This means your console is probably not configured to display UTF-8 and it is a printing/display problem, rather than a problem with representation.
Upvotes: 1