Reputation: 89
Why the two following results are different?
bsh % System.out.println((byte)'\u0080');
-128
bsh % System.out.println("\u0080".getBytes()[0]);
63
Thanks for your answers.
Upvotes: 3
Views: 1650
Reputation: 74800
Actually, if you want to get the same result with the toString()
call, specify UTF-16_LE
as the charset encoding:
bsh % System.out.println("\u0080".getBytes("UTF-16LE")[0]);
-128
Java Strings are encoded internally as UTF-16, and since we want the lower byte like for the cast char -> byte, we use little endian here. Big endian works too, if we change the array index:
bsh % System.out.println("\u0080".getBytes("UTF-16BE")[1]);
-128
Upvotes: 0
Reputation: 346536
(byte)'\u0080'
just takes the numerical value of the codepoint, which does not fit into a byte
and thus is subject to a narrowing primitive conversion which drops the bits that don't fit into the byte and, since the highest-order bit is set, yields a negative number.
"\u0080".getBytes()[0]
transforms the characters to bytes according to your platform default encoding (there is an overloaded getBytes()
method that allows you to specify the encoding). It looks like your platform default encoding cannot represent codepoint U+0080, and replaces it by "?" (codepoint U+003F, decimal value 63).
Upvotes: 5
Reputation: 597412
Here the byte array has 2 elements - that's because the representation of unicode chars does not fit in 1 byte.
On my machine the array contains [-62, -128]
. That's because my default encoding is UTF-8. Never use getBytes()
without specifying an encoding.
Upvotes: 2
Reputation: 533880
When you have a character which a character encoding doesn't support it turns it into '?' which is 63 in ASCII.
try
System.out.println(Arrays.toString("\u0080".getBytes("UTF-8")));
prints
[-62, -128]
Upvotes: 1
Reputation: 242786
Unicode character U+0080 <control>
can't be represented in your system default encoding and therefore is replaced by ?
(ASCII code 0x3F = 63) when string is encoded into your default encoding by getBytes()
.
Upvotes: 3