Reputation: 787
I am trying to read multiple lines in mapper. For that I started using NLineInputFormat class. While using this, I am getting GC limit error. For reference, the error code is:
16/02/21 01:37:13 INFO mapreduce.Job: map 0% reduce 0%
16/02/21 01:37:38 WARN mapred.LocalJobRunner: job_local726191039_0001
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1019)
at java.util.concurrent.ConcurrentHashMap.putAll(ConcurrentHashMap.java:1084)
at java.util.concurrent.ConcurrentHashMap.<init>(ConcurrentHashMap.java:852)
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:713)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:442)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.<init>(LocalJobRunner.java:217)
at org.apache.hadoop.mapred.LocalJobRunner$Job.getMapTaskRunnables(LocalJobRunner.java:272)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:517)
16/02/21 01:37:39 INFO mapreduce.Job: Job job_local726191039_0001 failed with state FAILED due to: NA
For reference, please find the code snippet below.
public class JobLauncher {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "TestDemo");
job.setJarByClass(JobLauncher.class);
job.setMapperClass(CSVMapper.class);
job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(NullWritable.class);
conf.setInt(NLineInputFormat.LINES_PER_MAP, 3);
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.addInputPath(job, new Path(args[0]));
job.setNumReduceTasks(0);
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
I just have simple CSVMapper mapper. Why I am getting this error ? Please help me resolve this error.
Thanks in advance.
Upvotes: 2
Views: 705
Reputation: 718856
Why I am getting this error?
In general, the most likely explanations for an OOME are that you have run out of memory because
(With this particular "flavour" of OOME, you haven't completely run out of memory. However, in all likelihood you are close to running out, and that has caused the GC CPU utilization to spike, exceeding the "GC overhead" threshold. This detail doesn't change the way you should try to solve your problem.)
In your case, it looks like the error is occurring while you are loading input from a file into a map (or collection of maps). The inference is therefore that you have told Hadoop to load more data than is going to fit in memory at one time.
Please help me resolve this error.
Solutions:
If you need a more specific answer, you will need to provide more details.
Upvotes: 1
Reputation: 38910
Adding to Stephen C answer, which lists out possible solutions
From oracle documentation link,
Exception in thread thread_name: java.lang.OutOfMemoryError: GC Overhead limit exceeded
Cause: The detail message "GC overhead limit exceeded" indicates that the garbage collector is running all the time and Java program is making very slow progress. After a garbage collection, if the Java process is spending more than approximately 98% of its time doing garbage collection and if it is recovering less than 2% of the heap and has been doing so far the last 5 (compile time constant) consecutive garbage collections, then a java.lang.OutOfMemoryError is thrown.
This exception is typically thrown because the amount of live data barely fits into the Java heap having little free space for new allocations.
Action: Increase the heap size. The java.lang.OutOfMemoryError exception for GC Overhead limit exceeded can be turned off with the command line flag -XX:-UseGCOverheadLimit.
Have a look at this SE question for better handling of this error:
java.lang.OutOfMemoryError: GC overhead limit exceeded
Upvotes: 0