GC overhead limit exceeded with NLineInputFormat Usage

I am trying to read multiple lines in mapper. For that I started using NLineInputFormat class. While using this, I am getting GC limit error. For reference, the error code is:

16/02/21 01:37:13 INFO mapreduce.Job:  map 0% reduce 0%
16/02/21 01:37:38 WARN mapred.LocalJobRunner: job_local726191039_0001
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1019)
at java.util.concurrent.ConcurrentHashMap.putAll(ConcurrentHashMap.java:1084)
at java.util.concurrent.ConcurrentHashMap.<init>(ConcurrentHashMap.java:852)
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:713)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:442)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.<init>(LocalJobRunner.java:217)
at org.apache.hadoop.mapred.LocalJobRunner$Job.getMapTaskRunnables(LocalJobRunner.java:272)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:517)
16/02/21 01:37:39 INFO mapreduce.Job: Job job_local726191039_0001 failed with state FAILED due to: NA

For reference, please find the code snippet below.

public class JobLauncher {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "TestDemo");
        job.setJarByClass(JobLauncher.class);

        job.setMapperClass(CSVMapper.class);
        job.setMapOutputKeyClass(NullWritable.class);
        job.setMapOutputValueClass(NullWritable.class);

        conf.setInt(NLineInputFormat.LINES_PER_MAP, 3);
        job.setInputFormatClass(NLineInputFormat.class);
        NLineInputFormat.addInputPath(job, new Path(args[0]));

        job.setNumReduceTasks(0);
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
   }
}

I just have simple CSVMapper mapper. Why I am getting this error ? Please help me resolve this error.

Thanks in advance.

Upvotes: 2

Answers (2)

Stephen C

Reputation: 718856

Why I am getting this error?

In general, the most likely explanations for an OOME are that you have run out of memory because

your code has a memory leak, or
you do not enough memory for what you are trying to do / the way you are trying to do it.

(With this particular "flavour" of OOME, you haven't completely run out of memory. However, in all likelihood you are close to running out, and that has caused the GC CPU utilization to spike, exceeding the "GC overhead" threshold. This detail doesn't change the way you should try to solve your problem.)

In your case, it looks like the error is occurring while you are loading input from a file into a map (or collection of maps). The inference is therefore that you have told Hadoop to load more data than is going to fit in memory at one time.

Please help me resolve this error.

Solutions:

Reduce input file size; e.g. by break your problem down into smaller problems
Increase the memory size (specifically, the Java heap size) for the affected JVM(s).
Change your application so that the job streams the data from the file (or from HFS) themselves ... rather than loading a CSV into a map.

If you need a more specific answer, you will need to provide more details.

Upvotes: 1

Ravindra babu

Reputation: 38910

Adding to Stephen C answer, which lists out possible solutions

From oracle documentation link,

Exception in thread thread_name: java.lang.OutOfMemoryError: GC Overhead limit exceeded

Cause: The detail message "GC overhead limit exceeded" indicates that the garbage collector is running all the time and Java program is making very slow progress. After a garbage collection, if the Java process is spending more than approximately 98% of its time doing garbage collection and if it is recovering less than 2% of the heap and has been doing so far the last 5 (compile time constant) consecutive garbage collections, then a java.lang.OutOfMemoryError is thrown.

This exception is typically thrown because the amount of live data barely fits into the Java heap having little free space for new allocations.

Action: Increase the heap size. The java.lang.OutOfMemoryError exception for GC Overhead limit exceeded can be turned off with the command line flag -XX:-UseGCOverheadLimit.

Have a look at this SE question for better handling of this error:

java.lang.OutOfMemoryError: GC overhead limit exceeded

Upvotes: 0

GC overhead limit exceeded with NLineInputFormat Usage

Answers (2)

Related Questions