Encode a codepoint

Question

I have a Unicode codepoint, which could be anything: possibly ASCII, possibly something in the BMP, and possibly an exotic emoji such as U+1F612.

I expected there would be an easy way to take a codepoint and encode it into a byte array, but I can't find a simple way. I can turn it into a String, and then encode it, but that is a round-about way involving first encoding it to UTF-16 and then re-encoding it to the required encoding. I'd like to encode it directly to bytes.

public static byte[] encodeCodePoint(int codePoint, Charset charset) {
    // Surely there's got to be a better way than this:
    return new StringBuilder().appendCodePoint(codePoint).toString().getBytes(charset);
}

Remy Lebeau · Accepted Answer

There is really no way to avoid using UTF-16, since Java uses UTF-16 for text data, and that is what the charset convertors are designed for. But, that doesn't mean you have to use a String for the UTF-16 data:

public static byte[] encodeCodePoint(int codePoint, Charset charset) {
    char[] chars = Character.toChars(codePoint);
    CharBuffer cb = CharBuffer.wrap(chars);
    ByteBuffer buff = charset.encode(cb);
    byte[] bytes = new byte[buff.remaining()];
    buff.get(bytes);
    return bytes;
}

Encode a codepoint

Answers (2)

Related Questions