Reputation: 1037
Could you someone help me to resolve the GC overhead error.
Background: This is a pig script that loads data from 4 hive tables through HCATALOGUE. The hive tables are in sequence file and partitioned by date. The load data size is approx. 24TB.
This script has run successful for 16TB.
Issue: The job fails while trying to read data from hive. The application Id is not submitted for this map reduce and the failure happens even before the application Id is submitted. So, could not find the logs in YARN.
I tried modifying yarn.app.mapreduce.am.resource.mb to 6G, mapreduce.map.memory.mb(6GB), mapreduce.map.java.opts(0.8% of 6GB), mapreduce.reduce.memory.mb(8GB) and mapreduce.reduce.java.opts. And still get the same error.
Any help on this please?
Thank you.
Upvotes: 1
Views: 10495
Reputation: 744
There are two configurations, need to be changed.
1.-XX:+UseConcMarkSweepGC = makes GC more frequent.
in hive console, just fire this, u should be good to go .
hive> SET mapred.child.java.opts=-Xmx4G -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit;
Upvotes: 5