Why is my program handling Character Encoding incorrectly?

Question

I wrote what I thought was very simple, very basic code to spit out Unicode characters, along with the underlying bytes.

public class UnicodeTesting {
    public static void main(String[] args) {
        System.out.println(System.getProperty("java.version"));
        String header = "\u2554\u2550";
        for(byte b : header.getBytes()) {
            System.out.printf("%02X ", b);
        }
        System.out.println();
        System.out.println(header);
    }
}

And when I run this code on OnlineGDB.com, I get the output I expect.

1.8.0_201
E2 95 94 E2 95 90 
╔═

However, when I run this exact same code in my Eclipse IDE locally, I get a very different result:

1.8.0_131
3F 3F 
??

Why is this happening?

If I edit the code on the Eclipse side of things, I can at least get the byte values to arrive at what I expect by forcing the getBytes method to use UTF-8 Encoding:

import java.io.UnsupportedEncodingException;

public class UnicodeTesting2 {
    public static void main(String[] args) throws UnsupportedEncodingException {
        System.out.println(System.getProperty("java.version"));
        String header = "\u2554\u2550";
        for(byte b : header.getBytes("UTF-8")) {
            System.out.printf("%02X ", b);
        }
        System.out.println();
        System.out.println(header);
    }
}

1.8.0_131
E2 95 94 E2 95 90 
??

(I am assuming that my console simply does not support these characters, so I'm not worried about them turning out incorrect)

But this doesn't explain why the literal behavior of the program is different between these two environments, defaulting to UTF-8 in one environment but defaulting to ASCII (I'm assuming) in Eclipse.

Curiosa Globunznik · Accepted Answer

Per the Java String documentation:

getBytes()
Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array.

getBytes(string charsetName)
Encodes this String into a sequence of bytes using the given charset, storing the result into a new byte array.

On your system the default charset is not UTF-8.

Why is my program handling Character Encoding incorrectly?

Answers (1)

Related Questions