Reputation: 267
I recently discovered that you convert a String to a byte array in the following manner:
String S = "ab";
byte arr[] = S.getBytes();
Now, I tried with String "\u9999"
and the answer was [63]
. I thought it would be 9999 (mod 128) = 15 which is actually what we get if we do byte b = 9999
. What is the reason behind the 63?
Upvotes: 4
Views: 3515
Reputation: 5954
It's about the default charset. It may have something to do with the encoding of your java file.
(On my machine, when I compile java file with encoding of cp1252, getBytes()
seems to also use cp1252 as default charset. Since cp1252 doesn't support the unicode character, it becomes a ?
character, i.e. 63
. When I compile java with encoding of UTF-16, getBytes()
returns the data 0x9999
as expected.)
The behavior of this method when this string cannot be encoded in the default charset is unspecified. (Source:
getBytes()
from oracle.com)
My suggestion is to simply use "\u9999".getBytes(StandardCharsets.UTF_16LE)
(or UTF_16BE
) to get the 2-byte array you desire. So there is no need to be concerned about encoding of java source. The array should be {-103,-103}
.
byte
with value of -103
is represented in memory as 0x99
.
Upvotes: 1
Reputation: 6674
For Unicode characters, you can specify the encoding in the call to getBytes
:
byte arr[] = S.getBytes("UTF8");
As far as why you are getting 63
as a result, the call to getBytes
without a parameter uses your platform's default encoding. The character \u9999
cannot be properly represented in your default encoding, so that gets turned into ?
which in ASCII has the decimal value 63
.
Upvotes: 6