Reputation: 4401
First of all I would try to explain what I need to do. I need to read a file (whose size could be from 1 byte to 2 GB), 2 GB maximum because I try to use MappedByteBuffer for fast reading. Maybe later I will try to read file in chunks in order to read files of arbitrary size.
When i read file I convert its bytes and convert them (using ASCII encoding) to chars which later I put into a StringBuilder and then I put this String Builder into an ArrayList
However I also need to do the following:
User could type blockSize
which is the number of chars I have to read into the StringBuilder (which is basically number of file bytes converted to chars)
Once I have collected the user defined char count, I create a copy of the String Builder and put it into an Array List
All steps are performed for every char read. The problem is with String Builder since if the file is big (<500 MB), I get the exception OutOfMemoryError.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
at java.lang.StringBuilder.<init>(StringBuilder.java:80)
at java.lang.StringBuilder.<init>(StringBuilder.java:106)
at borrows.wheeler.ReadFile.readFile(ReadFile.java:43)
Java Result: 1
I post my code, maybe someone could suggest improvements to this code or suggest some alternatives.
public class ReadFile {
//matrix block size
public int blockSize = 100;
public int charCounter = 0;
public ArrayList readFile(File file) throws FileNotFoundException, IOException {
FileChannel fc = new FileInputStream(file).getChannel();
MappedByteBuffer mbb = fc.map(FileChannel.MapMode.READ_ONLY, 0, (int) fc.size());
ArrayList characters = new ArrayList();
int counter = 0;
StringBuilder sb = new StringBuilder();//blockSize-1
while (mbb.hasRemaining()) {
char charAscii = (char)mbb.get();
counter++;
charCounter++;
if (counter == blockSize){
sb.append(charAscii);
characters.add(new StringBuilder(sb));//new StringBuilder(sb)
sb.delete(0, sb.length());
counter = 0;
}else{
sb.append(charAscii);
}
if(!mbb.hasRemaining()){
characters.add(sb);
}
}
fc.close();
return characters;
}
}
EDIT: I am doing Burrows-Wheeler transformation. There i should read every file then by Block Size create as many as needed matrixes. well i believe that wiki will explain better than me:
http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform
Upvotes: 2
Views: 1101
Reputation: 310915
I try to use MappedByteBuffer for fast reading. Maybe later I will try to read file in chunks in order to read files of arbitrary size.
When i read file I convert its bytes and convert them (using ASCII encoding) to chars which later I put into a StringBuilder and then I put this String Builder into an ArrayList
This sounds more like a problem than a solution. I suggest to you that the file already is ASCII, or character data; that it could be read pretty efficiently using a BufferedReader; and that it can be processed one line at a time.
So do that. You won't get even double the speed by using a MappedByteBuffer, and everything you're doing including the MappedByteBuffer is consuming memory on a truly heroic scale.
If the file isn't such that it can be processed line by line, or record by record, there is something badly wrong upstream.
Upvotes: 1
Reputation: 42607
If you load large files, it's not entirely surprising that you run out of memory.
How much memory do you have? Are you on a 64-bit system with 64-bit Java? How much heap memory have you allocated (e.g using -Xmx
setting)?
Bear in mind that you will need at least twice as much memory as the filesize, because Java uses Unicode UTF-16, which uses at least 2 bytes for each character, but your input is one byte per character. So to load a 2GB file you will need at least 4GB allocated to the heap just for storing this text data.
Also, you need to sort out the logic in your code - you do the same sb.append(charAscii)
in the if
and the else
, and you test !mbb.hasRemaining()
in every iteration of a while((mbb.hasRemaining())
loop.
As I asked in your previous question, do you need to store StringBuilders, or would the resulting Strings be OK? Storing strings would save space because StringBuilder allocates memory in big chunks (I think it doubles in size every time it runs out of space!) so may waste a lot.
If you do have to use StringBuilders then pre-sizing them to the value of blockSize
would make the code more memory-efficient (and faster).
Upvotes: 1