alexhak
alexhak

Reputation: 131

Hex-string to UTF-8-string in Java

I have a number of hex: 35 d8 de de de de 43 f2 71 84 4b f3 be 4d 4d 65 4a 17 41 bb 40 a5 85 c4 bd fd 7a 4e fb 24 27 4e

This is 32 bytes!

I do this:

String b = "35d8dededede43f271844bf3be4d4d654a1741bb40a585c4bdfd7a4efb24274e";
    byte[] bytes = fromHex(b);
    String st = new String(bytes, StandardCharsets.UTF_8);
    System.out.println(bytes.length);   // 32
    System.out.println(st.length());    // 30

  private static byte[] fromHex(String hex)
{
    byte[] binary = new byte[hex.length() / 2];
    for(int i = 0; i < binary.length; i++)
    {
        binary[i] = (byte)Integer.parseInt(hex.substring(2*i, 2*i+2), 16);
    }
    return binary;
}

And I get an answer:

32
30

But I expect to get a 32 UTF-8 character string! Why do I get a 30 character string? How can I get 32 UTF-8 bytes?

Upvotes: 0

Views: 415

Answers (1)

Andy Turner
Andy Turner

Reputation: 140299

Why do I get a 30 character string?

There are byte sequences in that string such that multiple bytes are converted to a single Unicode codepoint when decoding from UTF-8.

How can I get 32 UTF-8 bytes.

We can't. It's a 30-character UTF-8 string?

And it's wrong anyway to say "UTF-8 bytes". They're not bytes any more.

Upvotes: 2

Related Questions