Reputation: 111
When running a python job in AWS Glue I get the error:
Reason: Container killed by YARN for exceeding memory limits. 5.6 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead
When running this in the beginning of the script:
print '--- Before Conf --'
print 'spark.yarn.driver.memory', sc._conf.get('spark.yarn.driver.memory')
print 'spark.yarn.driver.cores', sc._conf.get('spark.yarn.driver.cores')
print 'spark.yarn.executor.memory', sc._conf.get('spark.yarn.executor.memory')
print 'spark.yarn.executor.cores', sc._conf.get('spark.yarn.executor.cores')
print "spark.yarn.executor.memoryOverhead", sc._conf.get("spark.yarn.executor.memoryOverhead")
print '--- Conf --'
sc._conf.setAll([('spark.yarn.executor.memory', '15G'),('spark.yarn.executor.memoryOverhead', '10G'),('spark.yarn.driver.cores','5'),('spark.yarn.executor.cores', '5'), ('spark.yarn.cores.max', '5'), ('spark.yarn.driver.memory','15G')])
print '--- After Conf ---'
print 'spark.driver.memory', sc._conf.get('spark.driver.memory')
print 'spark.driver.cores', sc._conf.get('spark.driver.cores')
print 'spark.executor.memory', sc._conf.get('spark.executor.memory')
print 'spark.executor.cores', sc._conf.get('spark.executor.cores')
print "spark.executor.memoryOverhead", sc._conf.get("spark.executor.memoryOverhead")
I get following output:
--- Before Conf --
spark.yarn.driver.memory None
spark.yarn.driver.cores None
spark.yarn.executor.memory None
spark.yarn.executor.cores None
spark.yarn.executor.memoryOverhead None
--- Conf --
--- After Conf ---
spark.yarn.driver.memory 15G
spark.yarn.driver.cores 5
spark.yarn.executor.memory 15G
spark.yarn.executor.cores 5
spark.yarn.executor.memoryOverhead 10G
It seems like the spark.yarn.executor.memoryOverhead is set but why is it not recognized? I still get the same error.
I have seen other posts regarding problems with setting the spark.yarn.executor.memoryOverhead but not when it seems to be set and not working?
Upvotes: 6
Views: 8818
Reputation: 1037
Open Glue > Jobs > Edit your Job > Script libraries and job parameters (optional) > Job parameters near the bottom
Set the following > key: --conf value: spark.yarn.executor.memoryOverhead=1024
Upvotes: 6
Reputation: 1939
Unfortunately the current version of the Glue doesn't support this functionality. You cannot set other parameters than using UI. In your case, instead of using AWS Glue, you can use AWS EMR service.
When I had the similar problem I tried to reduce the number of shuffles and the amount of data shuffled, and increase DPU. During the work on this problem I based on the following articles. I hope they will be useful.
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/
Updated: 2019-01-13
Amazon added lately new section to AWS Glue documentation which describes how to monitor and optimize Glue jobs. I think it is very useful thing to understand where is the problem related to memory issue and how to avoid it.
https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-glue-job-cloudwatch-metrics.html
Upvotes: 2