Reputation: 69
I was writing a function in java that can read file and get its content to String:
public static String ReadFromFile(String fileLocation) {
StringBuilder result = new StringBuilder();
RandomAccessFile randomAccessFile = null;
FileChannel fileChannel = null;
try {
randomAccessFile = new RandomAccessFile(fileLocation, "r");
fileChannel = randomAccessFile.getChannel();
ByteBuffer byteBuffer = ByteBuffer.allocate(10);
CharBuffer charBuffer = null;
int bytesRead = fileChannel.read(byteBuffer);
while (bytesRead != -1) {
byteBuffer.flip();
charBuffer = StandardCharsets.UTF_8.decode(byteBuffer);
result.append(charBuffer.toString());
byteBuffer.clear();
bytesRead = fileChannel.read(byteBuffer);
}
} catch (IOException ignored) {
} finally {
try {
if (fileChannel != null)
fileChannel.close();
if (randomAccessFile != null)
randomAccessFile.close();
} catch (IOException ignored) {
}
}
return result.toString();
}
From code above you can see that I set 'ByteBuffer.allocate' only 10 bytes on purpose to make things clearer. Now I want to read a file named "test.txt" that contains unicode charaters in Chinese like this:
乐正绫我爱你乐正绫我爱你
Below is my test code for it:
System.out.println(ReadFromFile("test.txt"));
Expected Output in Console
乐正绫我爱你乐正绫我爱你
Actual Output in Console
乐正绫���爱你��正绫我爱你
Possible Reason
ByteBuffer only allocated 10 bytes, thus unicode characters are truncated every 10 bytes.
Attempt To Solve
Increase ByteBuffer allocated bytes to 20, I got the result below:
乐正绫我爱你��正绫我爱你
Not A Robust Solution
Allocate ByteBuffer to a very huge number, like 102400, but it is not practical when it comes to very huge text files.
Question
How to solve this problem?
Upvotes: 0
Views: 491
Reputation: 73538
You can't, since you don't know how many bytes are used for each character in UTF-8 encoding, and you really don't want to rewrite that logic.
There's Files.readString() in Java 11, for lower versions you can use Files.readAllBytes() e.g.
Path path = new File(fileLocation).toPath()
String contents = new String(Files.readAllBytes(path), StandardCharsets.UTF_8);
Upvotes: 2