Reputation: 35
byte[] bytes = new byte[] { 1, -1 };
System.out.println(Arrays.toString(new String(bytes, "UTF-8").getBytes("UTF-8")));
System.out.println(Arrays.toString(new String(bytes, "ISO-8859-1").getBytes("ISO-8859-1")));
output:
[1, -17, -65, -67]
[1, -1]
why???
Upvotes: 0
Views: 1649
Reputation: 310883
String isn't a container for binary data. It is a container for char. -1 isn't a legal value for a char. There's no reason why what you're doing should ever work. Ergo, don't do it.
Upvotes: 0
Reputation: 346270
-1 is not a valid UTF-8 encoded character. [-17, -65, -67] is most likely the byte representation of the replacement character that gets substituted.
Upvotes: 2
Reputation: 1500385
Your byte array isn't a valid UTF-8-encoded string... so the string you get from
new String(bytes, "UTF-8")
contains U+0001 (for the first byte) and U+FFFD to signify bad data in the second byte. When that string is encoded using UTF-8, you get the byte pattern shown.
Basically you shouldn't try to interpret arbitrary binary data as if it were encoded in a particular encoding. If you want to represent arbitrary binary data as a string, use something like base64.
Upvotes: 6