halloei
halloei

Reputation: 2036

Using multiple threads make Garbage Collector use 100% of CPU time

I read a 1,3GB text file line-by-line. I extract and format the content to fit my needs and save it to a new text file again.

Originally I just used the main-thread. But the extracting and formatting takes a lot of CPU time and I wanted to accelerate that with multi-threading.

But that's what my profiler shows: enter image description here

The garbage collector time rises up to 100% as I start using multiple threads. So java.lang.OutOfMemoryError: GC overhead limit exceeded errors are thrown.

I have a function to process a single line and I execute that within a newFixedThreadPool. It doesn't matter if I assign one or four threads to the pool.

Using different profilers I can't find out what code causes the problem. And I don't understand why my GC is at 0,0% when I only use the main-thread.

Does anyone have an idea without having a look at the code?


Update: I tried to abstract some code:

A.java

ExecutorService executor = Executors.newFixedThreadPool(4);

while((line = reader.readLine()) != null) {
    Runnable processLine = new Runnable() {
        private String line;

        private Runnable init(String line) {
            this.line = line;
            return this;
        }

        @Override
        public void run() {
            processLine(line); // @B.java
        }
    }.init(line);

    executor.execute(processLine);
}

B.java

public int processLine(String line) {
    String[][] outputLines = new String[x][y];
    String field;

    for(... x ...) {
        for(... y ...) {
            field = extractField(line); // @C.java
            ...
            outputLines[x][y] = formatField(field); // @C.java
        }
    }

    write(outputLines); // write the generated lines to BufferedWriter(s)
}

C.java

public String extractField(String line) {
    if(filetype.equals("csv") {
        String[] splitLine = line.split(";");

        return splitLine[position];
    }
    ...
}

public String formatField(String field) {
    if(trim == true) {
        field = field.trim();
    }
    ...
}

Upvotes: 0

Views: 617

Answers (1)

Stephen C
Stephen C

Reputation: 718798

I expect that your application is using close to all of the available heap space. This will cause more and more of the JVM's time to be used running the garbage collector ... in a vain attempt to reclaim space. This is precisely the situation that the GC overhead limit is designed to deal with.

In short, either you have a storage leak, or your application needs more memory.


I have a function to process a single line and I execute that within a newFixedThreadPool. It doesn't matter if I assign one or four threads to the pool.

That is strongly suggestive (to me) that it is not the threads that cause the problem, but the way that you have implemented the multi-threading.


UPDATE

I think that the root cause of your problems is this:

    ExecutorService executor = Executors.newFixedThreadPool(4);

Note that the javadoc says that that method creates an executor with an unbounded work queue.

Suppose that your reader thread (the one executing the A code) can read and enqueue lines significantly faster than the threads doing the processing can process, format and and output them.

In that case, what happens is that the executor's work queue will get longer, and longer and ... eventually it will get so long that the heap is close to full of reachable objects. That will cause the GC to take a long time tracing the queue, and you'll get an OOME.

If this scenario is born out in practice, your application has the bad aspects of a memory leak ... even if you want to argue that it is not actually a leak.

The solution is simple, create an executor with a bounded work queue. You will need to do this by instantiating a ThreadPoolExecutor directly, and providing a suitably configured work queue object.

Upvotes: 2

Related Questions