Cristian
Cristian

Reputation: 7145

Java Loses International Characters in Stream

I am having trouble reading international characters in Java.

The default character set being used is UTF-8 and my Eclipse workspace is also set to this.

I am reading a title of a video from the Internet (Gangam Style in fact ;) ) which contains Korean characters, I am doing this as follows:

BufferedReader stdIn = new BufferedReader(new InputStreamReader(shellCommand.getInputStream()));
String fileName = null, output = null;
while ((output = stdInput.readLine()) != null) {
if (output.indexOf("Destination") > 0) {
    System.out.println(output);

I know that the title it will read is: "PSY - GANGNAM STYLE (강남스타일) M/V", but the console displays the following instead: "PSY - GANGNAM STYLE () M V" which causes errors further along in my program.

It seems like the InputStream Reader isn't reading these characters correctly.

Does anyone have any ideas? I've spent the last hour scouring the Internet and haven't found any answers. Thanks in advance everyone.

Upvotes: 0

Views: 932

Answers (2)

Amit Deshpande
Amit Deshpande

Reputation: 19185

You need to enure default char-set using Charset.defaultCharset().name() else use

InputStreamReader in = new InputStreamReader(shellCommand.getInputStream(), "UTF-8");

I tried sample program and it prints correctly in eclipse. It might be problem of windows console as AlexR has pointed out.

byte[] bytes = "PSY - GANGNAM STYLE (강남스타일) M/V".getBytes();
    InputStreamReader reader = new InputStreamReader(new ByteArrayInputStream(bytes));
    BufferedReader bufferedReader = new BufferedReader(reader);
    String str = bufferedReader.readLine();
    System.out.println(str);

Output:

 PSY - GANGNAM STYLE (강남스타일) M/V

Upvotes: 1

Jon Skeet
Jon Skeet

Reputation: 1500745

The default character set being used is UTF-8

The default where? In Java itself, or in the video? It would be a much clearer if you specified this explicitly. You should check that's correct for the video data too.

It seems like the InputStream Reader isn't reading these characters correctly.

Well, all we know is that the text isn't showing properly on the console. Either it isn't being read correctly, or it's not being displayed correctly. You should print out each character's Unicode value so you can check the exact content of the string. For example:

static void logCharacters(String text) {
    for (int i = 0; i < text.length(); i++) {
        char c = text.charAt(i);
        System.out.println(c + " " + Integer.toHexString(c));
    }
}

Upvotes: 2

Related Questions