Java string - UTF and byte representation

Question

I am wondering about java String and byte representation of it. I have a file encoded in UTF-16 little endian, when I view it in my hexeditor I can see

ff fe 61 00 f3 00 61 00 00

now, when I load it to Java using

 BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(fileName),"UTF-16"));
    StringBuilder builder = new StringBuilder();
    String line;

    while ((line = reader.readLine()) != null)
        builder.append(line);
    System.out.println(Arrays.toString(builder.toString().getBytes()));

I can see in output

[97, -13, 97]

if I am printing bytes why can't I see the zero ones that I can see in my hexeditor?

RA. · Accepted Answer

That is because Java does not keeps the string in the UTF-16 format in memory, that would be wasteful, and because getBytes returns the string in the default system charset (which is probably not UTF-16 on your machine) javadoc . The proper overload would be getBytes("UTF-16") - this way you should see the 0 padding at the end and maybe the BOM (ff fe) in the beginning.

Java string - UTF and byte representation

Answers (1)

Related Questions