Zhro
Zhro

Reputation: 2614

Why is Charset.encoder adding nulls when converting this array of chars?

This question is asking specifically why I am getting nulls from this encoding and is not a general question about how to convert a string to an array of bytes.

My actual use-case involves my input being a array of chars which I want to write to disk as an array of encoded bytes.

Why is it that when I try to encode a string in this way, the result has trailing nulls?

String someInput = "///server///server///server///";

char[] chars = someInput.toCharArray();
Charset encoding = StandardCharsets.UTF_8;

CharBuffer buf = CharBuffer.wrap(chars);

for (byte b : encoding.newEncoder().encode(buf).array())
   System.out.println("-> " + new Character((char)b));

The output is the following. Note that in the result example I have replaced the nulls with the '�' Unicode character for better visibility.

-> /
-> /
-> /
-> s
-> e
-> r
-> v
-> e
-> r
-> /
-> /
-> /
-> s
-> e
-> r
-> v
-> e
-> r
-> /
-> /
-> /
-> s
-> e
-> r
-> v
-> e
-> r
-> /
-> /
-> /
-> �
-> �
-> �

Upvotes: 0

Views: 106

Answers (2)

pbajpai
pbajpai

Reputation: 1369

I agree with @Peter answer, he is correct, I just want to add one more finding related to it, I debug this code and found that in the below for loop: At the call:

 encoding.newEncoder().encode(buf).array()

I debug the encode(buf) method call, and found that in CharsetEncoder.java file, in the encode() method, before starting the actual encoding it calculates the buffer size to allocate the encoded bytes by below line:

 int n = (int)(in.remaining() * averageBytesPerChar());

Here averageBytesPerChar() returns 1.1, and the size of our input ("///server///server///server///") is 30 , that's why the total size of newly allocated buffer i.e. n becomes 33.

That is the reason that in the output you are seeing 3 extra blank spaces. Hope It will help you in more understanding.

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533520

When the underlying array is created, it doesn't know how big it should be and grows it in multiple bytes/characters at a time (adding one byte at a time would be very inefficient)

However, once it has finished converting the text, it doesn't then shrink the array to make it smaller (or take a copy) as this also would be expensive.

In short, you cannot assume the underlying buffer is exactly the size it needs to be, it could be larger. You should consider the position() and limit() as the bounds of which bytes to use.

Upvotes: 1

Related Questions