Reputation: 1
I am trying to preprocess a large txt file (10G), and store it in binary file for future use. As the code runs it slows down and ends with
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
The input file has the following structure
200020000000008;0;2
200020000000004;0;2
200020000000002;0;2
200020000000007;1;2
This is the code I am using:
String strLine;
FileInputStream fstream = new FileInputStream(args[0]);
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
//Read File Line By Line
HMbicnt map = new HMbicnt("-1");
ObjectOutputStream outputStream = null;
outputStream = new ObjectOutputStream(new FileOutputStream(args[1]));
int sepIndex = 15;
int sepIndex2 = 0;
String str_i = "";
String bb = "";
String bbBlock = "init";
int cnt = 0;
lineCnt = 0;
while ((strLine = br.readLine()) != null) {
//rozparsovat radek
str_i = strLine.substring(0, sepIndex);
sepIndex2 = strLine.substring(sepIndex+1).indexOf(';');
bb = strLine.substring(sepIndex+1, sepIndex+1+sepIndex2);
cnt = Integer.parseInt(strLine.substring(sepIndex+1+sepIndex2+1));
if(!bb.equals(bbBlock)){
outputStream.writeObject(map);
outputStream.flush();
map = new HMbicnt(bb);
map.addNew(str_i + ";" + bb, cnt);
bbBlock = bb;
}
else{
map.addNew(str_i + ";" + bb, cnt);
}
}
outputStream.writeObject(map);
//Close the input stream
br.close();
outputStream.writeObject(map = null);
outputStream.close();
Basically, it goes through the in file and stores data to the object HMbicnt (which is a hash map). Once it encounters new value in second column it should write object to the output file, free memory and continue.
Thanks for any help.
Upvotes: 0
Views: 1195
Reputation: 8200
I think the problem is not that 10G is in memory, but that you are creating too many HashMaps. Maybe you could clear the HashMap instead of re-creating it after you don't need it anymore. There seems to have been a similar problem in java.lang.OutOfMemoryError: GC overhead limit exceeded , it is also about HashMaps
Upvotes: 2
Reputation: 13304
Simply put, you're using too much memory. Since, as you said, your file is 10 GB, there is no way you're going to be able to fit it all into memory (unless, of course, you happen to have over 10 GB of RAM and have configured Java to use it).
From what I can tell from your code and description of it, you're reading the entire file into memory and adding it to one huge in-RAM map as you're doing so, then writing your result to output. This is not feasible. You'll need to redesign your code to work in-place (i.e. only keep a small portion of the file in memory at any given time).
Upvotes: 1