Reputation: 3
I have split the file based on the below code,
int sizeOfFiles = 1024 * 3;// 1MB
byte[] buffer = new byte[sizeOfFiles];
// String fileName = f.getName();
//try-with-resources to ensure closing stream
try (ByteArrayInputStream fis = new ByteArrayInputStream(f);) {
int bytesAmount = 0;
int i=0;
while ((bytesAmount = fis.read(buffer)) > 0) {
String result="";
for (byte b : buffer) {
result+=(char)b;
}
System.out.println(result);
System.out.print("--------------------------------------------------------");
}
}
}
But when I copy the first 3072 bytes in the buffer and paste it in the notepad++, I was getting to show that the same data is more than 3072 bytes. Can you please help me with this issue?
Note: I am using windows server, eclipse and file or string is in the format UTF-8 charset.
Upvotes: 0
Views: 212
Reputation: 719336
The first problem is that there is a bug in this line:
for (byte b : buffer) {
You are assuming that all of the byte positions in buffer
contain data. But what if the read
call returned fewer than sizeOfFiles
bytes?
The second problem is that this line is liable to mangle the data.
result += (char) b;
You are taking each byte of input and casting it to a character. But if the input file is binary, those bytes don't represent characters. Alternatively, if the input is text, then a real character in the input may be encoded as 2 or more bytes, for example. Either way, when you cast from a byte
to char
you are not getting proper Unicode code units to append to the string
(The only cases where what you are doing would "work" are is the input file is ASCII or LATIN-1 encoded text.)
This mangling may well be increasing the number of bytes relative to the input stream, especially if you are outputting in UTF-8. Any input byte in the range 128 to 255 will turn into 2 bytes when cast to a char
and then encoded in UTF-8.
Finally, when you use println
to output the string you are adding an extra line separator after each buffer-full of data.
Upvotes: 1