Reputation: 7145
I am having trouble reading international characters in Java.
The default character set being used is UTF-8 and my Eclipse workspace is also set to this.
I am reading a title of a video from the Internet (Gangam Style in fact ;) ) which contains Korean characters, I am doing this as follows:
BufferedReader stdIn = new BufferedReader(new InputStreamReader(shellCommand.getInputStream()));
String fileName = null, output = null;
while ((output = stdInput.readLine()) != null) {
if (output.indexOf("Destination") > 0) {
System.out.println(output);
I know that the title it will read is: "PSY - GANGNAM STYLE (강남스타일) M/V", but the console displays the following instead: "PSY - GANGNAM STYLE () M V" which causes errors further along in my program.
It seems like the InputStream Reader isn't reading these characters correctly.
Does anyone have any ideas? I've spent the last hour scouring the Internet and haven't found any answers. Thanks in advance everyone.
Upvotes: 0
Views: 932
Reputation: 19185
You need to enure default char-set using Charset.defaultCharset().name()
else use
InputStreamReader in = new InputStreamReader(shellCommand.getInputStream(), "UTF-8");
I tried sample program and it prints correctly in eclipse. It might be problem of windows console as AlexR has pointed out.
byte[] bytes = "PSY - GANGNAM STYLE (강남스타일) M/V".getBytes();
InputStreamReader reader = new InputStreamReader(new ByteArrayInputStream(bytes));
BufferedReader bufferedReader = new BufferedReader(reader);
String str = bufferedReader.readLine();
System.out.println(str);
Output:
PSY - GANGNAM STYLE (강남스타일) M/V
Upvotes: 1
Reputation: 1500745
The default character set being used is UTF-8
The default where? In Java itself, or in the video? It would be a much clearer if you specified this explicitly. You should check that's correct for the video data too.
It seems like the InputStream Reader isn't reading these characters correctly.
Well, all we know is that the text isn't showing properly on the console. Either it isn't being read correctly, or it's not being displayed correctly. You should print out each character's Unicode value so you can check the exact content of the string. For example:
static void logCharacters(String text) {
for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
System.out.println(c + " " + Integer.toHexString(c));
}
}
Upvotes: 2