Kong
Kong

Reputation: 9546

Decode bytes to chars one at a time

I have an arbitrary chunk of bytes that represent chars, encoded in an arbitrary scheme (may be ASCII, UTF-8, UTF-16). I know the encoding.

What I'm trying to do is find the location of the last new line (\n) in the array of bytes. I want to know how many bytes are left over after reading the last encoded \n.

I can't find anything in the JDK or any other library that will let me convert a byte array to chars one by one. InputStreamReader reads the stream in chunks, not giving me any indication how many bytes are getting read to produce a char.

Am I going to have to do something as horrible are re-encoding each char to figure out its byte length?

Upvotes: 2

Views: 1466

Answers (1)

Evgeniy Dorofeev
Evgeniy Dorofeev

Reputation: 136002

You can try something like this

    CharsetDecoder cd = Charset.forName("UTF-8").newDecoder();
    ByteBuffer in = ByteBuffer.wrap(bytes);
    CharBuffer out = CharBuffer.allocate(1);
    int p = 0;
    while (in.hasRemaining()) {
        cd.decode(in, out, true);
        char c = out.array()[0];
        int nBytes = in.position() - p;
        p = in.position();
        out.position(0);
    }

Upvotes: 4

Related Questions