Forcing Unicode in byte variable

Question

I recently discovered that you convert a String to a byte array in the following manner:

String S = "ab";
byte arr[] = S.getBytes();

Now, I tried with String "\u9999" and the answer was [63]. I thought it would be 9999 (mod 128) = 15 which is actually what we get if we do byte b = 9999. What is the reason behind the 63?

p e p · Accepted Answer

For Unicode characters, you can specify the encoding in the call to getBytes:

byte arr[] = S.getBytes("UTF8");

As far as why you are getting 63 as a result, the call to getBytes without a parameter uses your platform's default encoding. The character \u9999 cannot be properly represented in your default encoding, so that gets turned into ? which in ASCII has the decimal value 63.

Forcing Unicode in byte variable

Answers (2)

Related Questions