Reputation: 207
byte[] byteArray = Charset.forName("UTF-8").encode("hello world").array();
System.out.println(byteArray.length);
Why does the above line of code prints out 12, shouldn't it be printing 11 instead?
Upvotes: 9
Views: 2669
Reputation: 7576
Because it returns a ByteBuffer
. That's the buffer's capacity (not really even that because of possible slicing), not how many bytes are used. It's a bit like how malloc(10)
is free to return 32 bytes of memory.
System.out.println(Charset.forName("UTF-8").encode("hello world").limit());
That's 11 (as expected).
Upvotes: 2
Reputation: 47759
import java.nio.charset.*;
public class ByteArrayTest {
public static void main(String[] args) {
String theString = "hello world";
System.out.println(theString.length());
byte[] byteArray = Charset.forName("UTF-8").encode(theString).array();
System.out.println(byteArray.length);
for (int i = 0; i < byteArray.length; i++) {
System.out.println("Byte " + i + " = " + byteArray[i]);
}
}
}
Results:
C:\JavaTools>java ByteArrayTest
11
12
Byte 0 = 104
Byte 1 = 101
Byte 2 = 108
Byte 3 = 108
Byte 4 = 111
Byte 5 = 32
Byte 6 = 119
Byte 7 = 111
Byte 8 = 114
Byte 9 = 108
Byte 10 = 100
Byte 11 = 0
The array is null-terminated, like any good C-string would be.
(But apparently the real cause is the flaky method array. It probably should not be used in "production" code, except with great care.)
Upvotes: 0
Reputation: 10955
The length of the array is the size of the ByteBuffer
's capacity, which is generated from, but not equal to the number of characters you are encoding. Let's take a look at how we allocate memory for a ByteBuffer
...
If you drill into the encode()
method, you'll find that CharsetEncoder#encode(CharBuffer)
looks like this:
public final ByteBuffer encode(CharBuffer in)
throws CharacterCodingException
{
int n = (int)(in.remaining() * averageBytesPerChar());
ByteBuffer out = ByteBuffer.allocate(n);
...
According to my debugger, the averageBytesPerChar
of a UTF_8$Encoder
is 1.1
, and the input String
has 11
characters. 11 * 1.1 = 12.1
, and the code casts the total to an int
when it does the calculation, so the resulting size of the ByteBuffer
is 12.
Upvotes: 11