Reputation: 90023
What Charset does ByteBuffer.asCharBuffer() use? It seems to convert 3 bytes to one character on my system.
On a related note, how does CharsetDecoder relate to ByteBuffer.asCharBuffer()?
UPDATE: With respect to what implementation of ByteBuffer I am using, I am invoking ByteBuffer.allocate(1024).asCharBuffer()
. I can't comment on what implementation gets used under the hood.
Upvotes: 6
Views: 2102
Reputation: 7170
I wanted to expand on the answer by @Petteri H. It is true that asCharBuffer()
expects the ByteBuffer
to be already UTF-16 encoded. No further encoding conversion is performed. You can run an experiment using the code below.
First, create a plain text file called test.txt
with a few lines.
Hello World
Hi Moon
Howdy Jupiter
This file will be UTf-8 encoded by default. We expect this to be a problem since CharBuffer
will read two consecutive bytes to construct a character and give you garbage values. Later, we will fix the issue.
The following code will simply dump each character from the file. Note: It will treat each double byte sequence as a character.
import java.io.RandomAccessFile;
import java.nio.*;
import java.nio.channels.FileChannel;
import java.util.HashMap;
public class Main {
public static void main(String[] args) {
try (var file = new RandomAccessFile("test.txt", "r")) {
var mappedMemory = file.getChannel()
.map(FileChannel.MapMode.READ_ONLY, 0, file.length());
var buff = mappedMemory.asCharBuffer();
for (int i = 0; i < buff.length(); ++i) {
var ch = buff.get(i);
System.out.print(ch);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
When you run the code you will see unexpected characters:
䡥汬漠坯牬搊䡩⁍潯渊䡯睤礠䩵灩瑥爊
Now, let's encode the same file using UTF-16.
iconv -f utf-8 -t utf-16 test.txt > test-fixed.txt
Change Java code to read test-fixed.txt
. Then run it again.
Now, you will see the right output.
It is interesting to note that CharBuffer
skips the BOM marker which test-fixed.txt
file will have.
Upvotes: 0
Reputation: 90023
Looking at jdk7, jdk/src/share/classes/java/nio
X-Buffer.java.template
maps ByteBuffer.allocate()
to Heap-X-Buffer.java.template
Heap-X-Buffer.java.template
maps ByteBuffer.asCharBuffer()
to ByteBufferAs-X-Buffer.java.template
ByteBuffer.asCharBuffer().toString()
invokes CharBuffer.put(CharBuffer)
but I can't figure out where this leadsEventually this probably leads to Bits.makeChar()
which is defined as:
static private char makeChar(byte b1, byte b0) {
return (char)((b1 << 8) | (b0 & 0xff));
}
but I can't figure out how.
Upvotes: 0
Reputation: 30216
As I understand it, it doesn't use anything. It just assumes it is already correctly decoded as a string for Java, which means UTF-16. This can be shown by looking at the source for the HeapByteBuffer, where the returned charbuffer finally calls (little endian version):
static private char makeChar(byte b1, byte b0) {
return (char)((b1 << 8) | (b0 & 0xff));
}
So the only thing that is handled here is the endianness for the rest you're responsible. Which also means it's usually much more useful to use the Decoder class where you can specify the encoding.
Upvotes: 2
Reputation: 12222
For the first question - I believe it uses native character encoding of Java (UTF-16).
Upvotes: 4