Reputation: 6689
I am wondering about java String and byte representation of it. I have a file encoded in UTF-16 little endian, when I view it in my hexeditor I can see
ff fe 61 00 f3 00 61 00 00
now, when I load it to Java using
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(fileName),"UTF-16"));
StringBuilder builder = new StringBuilder();
String line;
while ((line = reader.readLine()) != null)
builder.append(line);
System.out.println(Arrays.toString(builder.toString().getBytes()));
I can see in output
[97, -13, 97]
if I am printing bytes why can't I see the zero ones that I can see in my hexeditor?
Upvotes: 1
Views: 537
Reputation: 1423
That is because Java does not keeps the string in the UTF-16 format in memory, that would be wasteful, and because getBytes returns the string in the default system charset (which is probably not UTF-16 on your machine) javadoc . The proper overload would be getBytes("UTF-16") - this way you should see the 0 padding at the end and maybe the BOM (ff fe) in the beginning.
Upvotes: 3