Andy Chan
Andy Chan

Reputation: 287

Java ByteBuffer clear data

I know that Java's ByteBuffer.clear() is not really to clean all data in ByteBuffer, so when I StringBuilder.append() string every time, the final result has always appended all remaining chars in ByteBuffer, which are the old data from last write, so how to fix this issues?

int byteRead = -1;
int readCount = 0;
int BUFFER_SIZE = 256;
StringBuilder sb = new StringBuilder();
ByteBuffer buffer = ByteBuffer.allocate(BUFFER_SIZE);
ReadableByteChannel readableByteChannel = Channels.newChannel(is);
while ((byteRead = readableByteChannel.read(buffer)) > 0 && readCount < 68) {
    sb.append(new String(buffer.array(), "UTF-8"));
    buffer.clear();
    readCount++;
}

Upvotes: 1

Views: 4137

Answers (3)

Holger
Holger

Reputation: 298311

As already pointed out by other answers, you have to consider the position of the buffer, which gets updated by the read method. So the correct code looks like:

while ((byteRead = readableByteChannel.read(buffer)) > 0 && readCount < 68) {
    sb.append(new String(buffer.array(),
        buffer.arrayOffset(), buffer.arrayOffset()+buffer.position(), "UTF-8"));
    buffer.clear();
    readCount++;
}

Note that in your special case, arrayOffset() will always be zero, but you better write the code in a way, that it doesn’t break when you change something at the buffer allocation code.

But this code is broken. When you read a multiple-byte UTF-8 sequence, it may happen, that the first bytes of that sequence are read in one operation and the remaining bytes are read in the next one. Your attempts to create String instances from these incomplete sequences will produce invalid characters. Besides that, you are creating these String instances, just to copy their contents to a StringBuilder, which is quite inefficient.

So, to do it correctly, you should do something like:

int readCount = 0;
int BUFFER_SIZE = 256;
StringBuilder sb = new StringBuilder();
CharsetDecoder dec=StandardCharsets.UTF_8.newDecoder();
ByteBuffer buffer = ByteBuffer.allocate(BUFFER_SIZE);
CharBuffer cBuffer= CharBuffer.allocate(BUFFER_SIZE);
ReadableByteChannel readableByteChannel = Channels.newChannel(is);
while(readableByteChannel.read(buffer) > 0 && readCount < 68) {
    buffer.flip();
    while(dec.decode(buffer, cBuffer, false).isOverflow()) {
        cBuffer.flip();
        sb.append(cBuffer);
        cBuffer.clear();
    }
    buffer.compact();
    readCount++;
}
buffer.flip();
for(boolean more=true; more; ) {
    more=dec.decode(buffer, cBuffer, true).isOverflow();
    cBuffer.flip();
    sb.append(cBuffer);
    cBuffer.clear();
}

Note, how both, ReadableByteChannel and CharsetDecoder process the buffers using their positions and limits. All you have to do, is to use flip and compact correctly as shown in the documentation of compact.

The only exception is the appending to the Stringbuilder, as that’s not an NIO function. There, we have to use clear(), as we know that the Stringbuilder.append operation does consume all characters from the buffer.

Note that this code still does not deal with certain (unavoidable) error conditions, since you stop after an arbitrary number of reads, it’s always possible that you cut in the middle of a multi-byte UTF-8 sequence.


But this quite complicated logic has been implemented by the JRE already and if you give up the idea of cutting after a certain number of bytes, you can utilize that:

int readCount = 0;
int BUFFER_SIZE = 256;
StringBuilder sb = new StringBuilder();
CharBuffer cBuffer= CharBuffer.allocate(BUFFER_SIZE);
ReadableByteChannel readableByteChannel = Channels.newChannel(is);
Reader reader=Channels.newReader(readableByteChannel, "UTF-8");
while(reader.read(cBuffer) > 0 && readCount < 68) {
    cBuffer.flip();
    sb.append(cBuffer);
    cBuffer.clear();
    readCount++;
}

Now this code will limit the reading to 256 × 68 characters rather than bytes, but for UTF-8 encoded data, this makes a difference only when there are multi-byte sequences, about which you apparently didn’t care before.

Finally, since you apparently have an InputStream in the first place, you don’t need the ReadableByteChannel detour at all:

int readCount = 0;
int BUFFER_SIZE = 256;
StringBuilder sb = new StringBuilder();
CharBuffer cBuffer = CharBuffer.allocate(BUFFER_SIZE);
Reader reader = new InputStreamReader(is, StandardCharsets.UTF_8);
while(reader.read(cBuffer) > 0 && readCount < 68) {
    cBuffer.flip();
    sb.append(cBuffer);
    cBuffer.clear();
    readCount++;
}

This might look like “not being NIO code”, but Readers are still the canonical way of reading character data, even with NIO; there’s no replacement. The method Reader.read(CharBuffer) was missing in the first release of NIO, but handed in with Java 5.

Upvotes: 6

Krzysztof Krasoń
Krzysztof Krasoń

Reputation: 27476

Use position() to get the current buffer position and get part of the array with Arrays.copyOf:

Arrays.copyOf(buffer.array(), 0, buffer.position());

Which will become in your case:

sb.append(new String(Arrays.copyOf(buffer.array(), 0, buffer.position()), "UTF-8"));

Or even shorter when using appropriate String constructor:

sb.append(new String(buffer.array(), 0, buffer.position(), "UTF-8"));

Or probably what you were looking for using slice(): sb.append(new String(buffer.slice().array(), "UTF-8"));

BTW. Instead of "UTF-8" it is better to use StandardCharsets.UTF_8.

Upvotes: 0

Titus
Titus

Reputation: 22474

You can use the new String(byte[] bytes, int offset, int length, String charsetName()) constructor.

new String(buffer.array(), 0, byteRead, "UTF-8"); 

This will prevent the previous data from being used when the new String is created.

Upvotes: 0

Related Questions