Reputation: 1566
I am having a bit of an issue where i am looping through a file that is excessively large (approximately 2gb). After about 5 minutes of running, i get the following issue: OutOfMemoryError: GC overhead limit exceeded.
My code is as follows which is relatively clean:
import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Scanner;
public class Organiser {
public static void main(String[] args) throws FileNotFoundException {
ArrayList<String> lines = new ArrayList<>();
String directory = "C:\\Users\\xxx\\Desktop\\Files\\combined";
Scanner fileIn = new Scanner(new File(directory + ".txt"));
while (fileIn.hasNextLine() == true) {
lines.add(fileIn.nextLine());
System.out.println("Reading.");
System.out.println("Reading..");
System.out.println("Reading...");
}
PrintWriter out = new PrintWriter(directory + "_ordered.txt");
Collections.sort(lines);
System.out.println("Ordering...");
for (String output : lines) {
out.println(output + "\n");
}
out.close();
System.out.println("Complete - See " + directory + "_ordered.txt");
}
}
Wondering how do i go about addressing this?
Upvotes: 1
Views: 1548
Reputation: 8803
When you see an OutOfMemoryException
, it's time for you to optimize your program aiming for a lower memory consumption.
Some typical "easy-gains" you can achieve:
ArrayList
nor Collections.sort
for sorting large amount of data: Instead, use TreeSet,
which automatically sorts its items according to natural order.-Xmx
option.Take a look at this post which is similar: Improving speed and memory consumption when handling ArrayList with 100 million elements
Upvotes: 0
Reputation: 408
Try specifying the java VM options when starting your program.
If you are using an IDE, go to the run configurations and supply -Xmx
and -Xms
flags with values as required for sorting the contents of the large file. Setting it to a high value of around 4GB along with wrapping the string content in a UTF-8
encoded ByteBuffer
instead of UTF-16
can help.
javac Organiser.java
java -Xms1024m -Xmx4096m Organiser
Upvotes: 0
Reputation: 100249
If your file contains latin-1 symbols, you can save some memory storing the lines in UTF-8 ByteBuffer
's instead of String
(String
are represented in UTF-16 which may take 2x memory usage for latin-1 only input):
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
...
ArrayList<ByteBuffer> lines = new ArrayList<>();
...
while (fileIn.hasNextLine() == true) {
lines.add(ByteBuffer.wrap(fileIn.nextLine().getBytes(StandardCharsets.UTF_8)));
...
for (ByteBuffer output : lines) {
out.println(new String(output.array(), StandardCharsets.UTF_8));
}
...
Unlike simple byte[]
array ByteBuffer
is comparable, thus can be sorted.
Upvotes: 0
Reputation: 1459
Don't read the complete file at once instead read it in chunks.
See InputSteram.read(byte[]) for reading bytes at a time.
Example Code:
try {
File file = new File("myFile");
FileInputStream is = new FileInputStream(file);
byte[] chunk = new byte[1024];
int chunkLen = 0;
while ((chunkLen = is.read(chunk)) != -1) {
// your code..
}
} catch (FileNotFoundException fnfE) {
// file not found, handle case
} catch (IOException ioE) {
// problem reading, handle case
}
Hope this will give you an idea.
That isn't exactly a Java problem. You need to look into an efficient algorithm for sorting data that isn't completely read into memory. A few adaptations to Merge-Sort can achieve this.
Take a look at this: http://en.wikipedia.org/wiki/Merge_sort
and: http://en.wikipedia.org/wiki/External_sorting
Basically the idea here is to break the file into smaller pieces, sort them (either with merge sort or another method), and then use the Merge from merge-sort to create the new, sorted file.
Upvotes: 0
Reputation: 533590
To sort very large files you can need to perform a merge sort of the largest amounts you can fit into memory. This is how the sort
unix utilities do it. Note: you can just run sort
from Java rather than implement it yourself.
A simpler option is to give the process more memory. You will need about 5 GB of heap or more. 2 GB of encoded text turns into 4 GB when UTF-16 encoded as Java does, plus space for the rest of your data structure.
Upvotes: 2