user1192878
user1192878

Reputation: 734

UTF-8 to code point

I need to implement a method like this: int toCodePoint(byte [] buf, int startIndex); It should decode a UTF-8 char in byte array to code point. No extra objects should be created(that's the reason why I don't use JDK String class to do decode). Are there any existing java classes to do this? Thank you.

Upvotes: 2

Views: 3059

Answers (2)

Malcolm
Malcolm

Reputation: 41498

You can use java.nio.charset.CharsetDecoder to do that. You'll need a ByteBuffer and a CharBuffer. Put the data into ByteBuffer, then use CharsetDecoder.decode(ByteBuffer in, CharBuffer out, boolean endOfInput) to read into the CharBuffer. Then you can get the code point using Character.codePointAt(char[] a, int index). It is important to use this method because if your text has characters outside the BMP, they will be translated into two chars, so it's not sufficient to read only one char.

With this method you only need to create two buffers once, after that no new objects will be created unless some error occurs.

Upvotes: 4

lxbndr
lxbndr

Reputation: 2208

All existing Java classes i know are not fits for this task, because you have restriction ("No extra objects should be created"). Otherwise you could use CharsetDecoder (as mentioned by Malcolm). Or even come to dark side and use sun.io.ByteToCharUTF8 if you really need pure static method. But it is not recommended way.

Upvotes: 0

Related Questions