Reputation:
I am not able to understand this: Why does the given code print out 12
and not 11
altough hello world
has only 11 characters?
byte[] byteArray = Charset.forName("UTF-8").encode("hello world").array();
System.out.println(byteArray.length);
Upvotes: 2
Views: 373
Reputation: 579
Using this program, you can figure out what bytes the byte array contains:
byte[] byteArray = Charset.forName("UTF-8").encode("hello world").encoded.array();
for(int i = 0; i < byteArray.length; i++) {
System.out.println(byteArray[i]+" - "+((char)byteArray[i]));
}
The bytes are (decimal):
104 101 108 108 111 32 119 111 114 108 100 0
The first 11 characters are the UTF-8 encoded string hello world
, as expected. The last byte is the Null character, which is used to represent nothing at all.
To deal with this, just use the .limit()
method of ByteBuffer
as mentioned above.
Upvotes: 0
Reputation: 12849
Easy to see if you debug the array:
b=68, char=h
b=65, char=e
b=6C, char=l
b=6C, char=l
b=6F, char=o
b=20, char=
b=77, char=w
b=6F, char=o
b=72, char=r
b=6C, char=l
b=64, char=d
b=0, char=
So last character is \u0000
Upvotes: 3
Reputation: 747
I'm not sure what you are trying to accomplish, but to get the byte array of a string, why not just use:
String s = "hello world";
byte[] b = s.getBytes("UTF-8");
assertEquals(s.length(), b.length);
More information in this answer:
How to convert Strings to and from UTF8 byte arrays in Java
Upvotes: 1
Reputation: 18569
The array
method of ByteBuffer
returns the array backing the buffer, but not all bytes are significant. Only the bytes up to limit
are used. The following returns 11 as expected:
int limit = Charset.forName("UTF-8").encode("hello world").limit();
System.out.println(limit);
Upvotes: 7