Raziel
Raziel

Reputation: 1566

Out of memory handling while looping through large file - Java

I am having a bit of an issue where i am looping through a file that is excessively large (approximately 2gb). After about 5 minutes of running, i get the following issue: OutOfMemoryError: GC overhead limit exceeded.

My code is as follows which is relatively clean:

import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Scanner;

public class Organiser {
    public static void main(String[] args) throws FileNotFoundException {
        ArrayList<String> lines = new ArrayList<>();
        String directory = "C:\\Users\\xxx\\Desktop\\Files\\combined";
        Scanner fileIn = new Scanner(new File(directory + ".txt"));
        while (fileIn.hasNextLine() == true) {
            lines.add(fileIn.nextLine());
            System.out.println("Reading.");
            System.out.println("Reading..");
            System.out.println("Reading...");
            }

        PrintWriter out = new PrintWriter(directory + "_ordered.txt");
        Collections.sort(lines);
        System.out.println("Ordering...");
        for (String output : lines) {
            out.println(output + "\n");
        }       
        out.close();
        System.out.println("Complete - See " + directory + "_ordered.txt");
    }
}

Wondering how do i go about addressing this?

Upvotes: 1

Views: 1548

Answers (5)

Little Santi
Little Santi

Reputation: 8803

When you see an OutOfMemoryException, it's time for you to optimize your program aiming for a lower memory consumption.

Some typical "easy-gains" you can achieve:

  • Do not use ArrayList nor Collections.sort for sorting large amount of data: Instead, use TreeSet, which automatically sorts its items according to natural order.
  • If that is not enough, increment the JVM memory through the -Xmx option.

Take a look at this post which is similar: Improving speed and memory consumption when handling ArrayList with 100 million elements

Upvotes: 0

deepak marathe
deepak marathe

Reputation: 408

Try specifying the java VM options when starting your program. If you are using an IDE, go to the run configurations and supply -Xmx and -Xms flags with values as required for sorting the contents of the large file. Setting it to a high value of around 4GB along with wrapping the string content in a UTF-8 encoded ByteBuffer instead of UTF-16 can help.

    javac Organiser.java
    java -Xms1024m -Xmx4096m Organiser

Upvotes: 0

Tagir Valeev
Tagir Valeev

Reputation: 100249

If your file contains latin-1 symbols, you can save some memory storing the lines in UTF-8 ByteBuffer's instead of String (String are represented in UTF-16 which may take 2x memory usage for latin-1 only input):

import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;

...
    ArrayList<ByteBuffer> lines = new ArrayList<>();
...
    while (fileIn.hasNextLine() == true) {
        lines.add(ByteBuffer.wrap(fileIn.nextLine().getBytes(StandardCharsets.UTF_8)));
...
    for (ByteBuffer output : lines) {
        out.println(new String(output.array(), StandardCharsets.UTF_8));
    }       
...

Unlike simple byte[] array ByteBuffer is comparable, thus can be sorted.

Upvotes: 0

Ankush soni
Ankush soni

Reputation: 1459

Don't read the complete file at once instead read it in chunks.

See InputSteram.read(byte[]) for reading bytes at a time.

Example Code:

try {
    File file = new File("myFile");
    FileInputStream is = new FileInputStream(file);
    byte[] chunk = new byte[1024];
    int chunkLen = 0;
    while ((chunkLen = is.read(chunk)) != -1) {
        // your code..
    }
} catch (FileNotFoundException fnfE) {
    // file not found, handle case
} catch (IOException ioE) {
    // problem reading, handle case
}

Hope this will give you an idea.

That isn't exactly a Java problem. You need to look into an efficient algorithm for sorting data that isn't completely read into memory. A few adaptations to Merge-Sort can achieve this.

Take a look at this: http://en.wikipedia.org/wiki/Merge_sort

and: http://en.wikipedia.org/wiki/External_sorting

Basically the idea here is to break the file into smaller pieces, sort them (either with merge sort or another method), and then use the Merge from merge-sort to create the new, sorted file.

Upvotes: 0

Peter Lawrey
Peter Lawrey

Reputation: 533590

To sort very large files you can need to perform a merge sort of the largest amounts you can fit into memory. This is how the sort unix utilities do it. Note: you can just run sort from Java rather than implement it yourself.

A simpler option is to give the process more memory. You will need about 5 GB of heap or more. 2 GB of encoded text turns into 4 GB when UTF-16 encoded as Java does, plus space for the rest of your data structure.

Upvotes: 2

Related Questions