Kachna
Kachna

Reputation: 2961

character encoding in java

I have tried the code below:

public static void main(String[] args) throws IOException {
    String s = "NETWORK";
    try (
            FileOutputStream fos = new FileOutputStream("d:/endian.txt");
            OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF_16BE");) {
        osw.write(s);
        osw.flush();

    }
}

after running, I get a file that contains the following chain: N E T W O R K; the size of the resulting file is 14 bytes (7 characters * 2 bytes). notice the spaces between characters of the chain. when I change the encoding with: UTF_16LE, I get a file size of 14 bytes, which contains the following string: NETWORK. no spaces between characters !!. I expect a string as follows: N E T W O R K. i used notepad to open the file. Can anyone explain this behavior?

Upvotes: 4

Views: 101

Answers (2)

Necreaux
Necreaux

Reputation: 9786

Don't use notepad to open the file. It does a terrible job of detecting encoding. Use a better tool in which you can specify the encoding, e.g. NotePad++ or a hex editor.

Upvotes: 1

PrimosK
PrimosK

Reputation: 13918

Binary representation of the "NETWORK" string using:

  • UTF_16BE is:

    00 4E 00 45 00 54 00 57 00 4F 00 52 00 4B (Notepad: N E T W O R K)

  • UTF_16LE is:

    4E 00 45 00 54 00 57 00 4F 00 52 00 4B 00 (Notepad: NETWORK)

The reason for behaviour that you are describing is because Notepad recognizes UTF_16BE representation of the "NETWORK" string as ANSI and UTF_16LE representation of "NETWORK" string as UNICODE.

As already suggested it would be better to use a hex editor to look at the binary representation of produced files in order to see exactly what gets written.

Upvotes: 4

Related Questions