Java convert unicode code point to string

Question

How can UTF-8 value like =D0=93=D0=B0=D0=B7=D0=B5=D1=82=D0=B0 be converted in Java?

I have tried something like:

Character.toCodePoint((char)(Integer.parseInt("D0", 16)),(char)(Integer.parseInt("93", 16));

but it does not convert to a valid code point.

Andreas · Accepted Answer

That string is an encoding of bytes in hex, so the best way is to decode the string into a byte[], then call new String(bytes, StandardCharsets.UTF_8).

Update

Here is a slightly more direct version of decoding the string, than provided by "sstan" in another answer. Of course both versions are good, so use whichever makes you more comfortable, or write your own version.

String src = "=D0=93=D0=B0=D0=B7=D0=B5=D1=82=D0=B0";

assert src.length() % 3 == 0;
byte[] bytes = new byte[src.length() / 3];
for (int i = 0, j = 0; i < bytes.length; i++, j+=3) {
    assert src.charAt(j) == '=';
    bytes[i] = (byte)(Character.digit(src.charAt(j + 1), 16) << 4 |
                      Character.digit(src.charAt(j + 2), 16));
}
String str = new String(bytes, StandardCharsets.UTF_8);

System.out.println(str);

Output

Газета

Java convert unicode code point to string

Answers (2)

Related Questions