Why is Charset.encoder adding nulls when converting this array of chars?

Question

This question is asking specifically why I am getting nulls from this encoding and is not a general question about how to convert a string to an array of bytes.

My actual use-case involves my input being a array of chars which I want to write to disk as an array of encoded bytes.

Why is it that when I try to encode a string in this way, the result has trailing nulls?

String someInput = "///server///server///server///";

char[] chars = someInput.toCharArray();
Charset encoding = StandardCharsets.UTF_8;

CharBuffer buf = CharBuffer.wrap(chars);

for (byte b : encoding.newEncoder().encode(buf).array())
   System.out.println("-> " + new Character((char)b));

The output is the following. Note that in the result example I have replaced the nulls with the '�' Unicode character for better visibility.

-> /
-> /
-> /
-> s
-> e
-> r
-> v
-> e
-> r
-> /
-> /
-> /
-> s
-> e
-> r
-> v
-> e
-> r
-> /
-> /
-> /
-> s
-> e
-> r
-> v
-> e
-> r
-> /
-> /
-> /
-> �
-> �
-> �

Peter Lawrey · Accepted Answer

When the underlying array is created, it doesn't know how big it should be and grows it in multiple bytes/characters at a time (adding one byte at a time would be very inefficient)

However, once it has finished converting the text, it doesn't then shrink the array to make it smaller (or take a copy) as this also would be expensive.

In short, you cannot assume the underlying buffer is exactly the size it needs to be, it could be larger. You should consider the position() and limit() as the bounds of which bytes to use.

Why is Charset.encoder adding nulls when converting this array of chars?

Answers (2)

Related Questions