Reputation: 131
I have a number of hex: 35 d8 de de de de 43 f2 71 84 4b f3 be 4d 4d 65 4a 17 41 bb 40 a5 85 c4 bd fd 7a 4e fb 24 27 4e
This is 32 bytes!
I do this:
String b = "35d8dededede43f271844bf3be4d4d654a1741bb40a585c4bdfd7a4efb24274e";
byte[] bytes = fromHex(b);
String st = new String(bytes, StandardCharsets.UTF_8);
System.out.println(bytes.length); // 32
System.out.println(st.length()); // 30
private static byte[] fromHex(String hex)
{
byte[] binary = new byte[hex.length() / 2];
for(int i = 0; i < binary.length; i++)
{
binary[i] = (byte)Integer.parseInt(hex.substring(2*i, 2*i+2), 16);
}
return binary;
}
And I get an answer:
32
30
But I expect to get a 32 UTF-8 character string! Why do I get a 30 character string? How can I get 32 UTF-8 bytes?
Upvotes: 0
Views: 415
Reputation: 140299
Why do I get a 30 character string?
There are byte sequences in that string such that multiple bytes are converted to a single Unicode codepoint when decoding from UTF-8.
How can I get 32 UTF-8 bytes.
We can't. It's a 30-character UTF-8 string?
And it's wrong anyway to say "UTF-8 bytes". They're not bytes any more.
Upvotes: 2