Reputation: 207

What is the length of a string encoded in a ByteBuffer

byte[] byteArray = Charset.forName("UTF-8").encode("hello world").array();
System.out.println(byteArray.length);

Why does the above line of code prints out 12, shouldn't it be printing 11 instead?

Upvotes: 9

Answers (3)

David Ehrmann

Reputation: 7576

Because it returns a ByteBuffer. That's the buffer's capacity (not really even that because of possible slicing), not how many bytes are used. It's a bit like how malloc(10) is free to return 32 bytes of memory.

System.out.println(Charset.forName("UTF-8").encode("hello world").limit());

That's 11 (as expected).

Upvotes: 2

Hot Licks

Reputation: 47759

import java.nio.charset.*;
public class ByteArrayTest {
    public static void main(String[] args) {
        String theString = "hello world";
        System.out.println(theString.length());
        byte[] byteArray = Charset.forName("UTF-8").encode(theString).array();
        System.out.println(byteArray.length);
        for (int i = 0; i < byteArray.length; i++) {
            System.out.println("Byte " + i + " = " + byteArray[i]);
        }
    }
}

Results:

C:\JavaTools>java ByteArrayTest
11
12
Byte 0 = 104
Byte 1 = 101
Byte 2 = 108
Byte 3 = 108
Byte 4 = 111
Byte 5 = 32
Byte 6 = 119
Byte 7 = 111
Byte 8 = 114
Byte 9 = 108
Byte 10 = 100
Byte 11 = 0

The array is null-terminated, like any good C-string would be.

(But apparently the real cause is the flaky method array. It probably should not be used in "production" code, except with great care.)

Upvotes: 0

azurefrog

Reputation: 10955

The length of the array is the size of the ByteBuffer's capacity, which is generated from, but not equal to the number of characters you are encoding. Let's take a look at how we allocate memory for a ByteBuffer...

If you drill into the encode() method, you'll find that CharsetEncoder#encode(CharBuffer) looks like this:

public final ByteBuffer encode(CharBuffer in)
    throws CharacterCodingException
{
    int n = (int)(in.remaining() * averageBytesPerChar());
    ByteBuffer out = ByteBuffer.allocate(n);
    ...

According to my debugger, the averageBytesPerChar of a UTF_8$Encoder is 1.1, and the input String has 11 characters. 11 * 1.1 = 12.1, and the code casts the total to an int when it does the calculation, so the resulting size of the ByteBuffer is 12.

Upvotes: 11

What is the length of a string encoded in a ByteBuffer

Answers (3)

Related Questions