spark-submit executor-memory issue on Amazon EMR 5.0

Question

I launch a Python Spark program like this:

/usr/lib/spark/bin/spark-submit \
  --master yarn \
  --executor-memory 2g \
  --driver-memory 2g \
  --num-executors 2 --executor-cores 4 \
    my_spark_program.py

I get the error:

Required executor memory (2048+4096 MB) is above the max threshold (5760 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.

This is a brand new EMR 5 cluster with one master m3.2xlarge systems and two core m3.xlarge systems. Everything should be set to defaults. I am currently the only user running only one job on this cluster.

If I lower executor-memory from 2g to 1500m, it works. This seems awfully low. An EC2 m3.xlarge server has 15GB of RAM. These are Spark worker/executor machines, they have no other purpose, so I would like to use as much of that as possible for Spark.

Can someone explain how I go from having an EC2 worker instance with 15GB to being able to assign a Spark worker only 1.5GB?

On [http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/TaskConfiguration_H2.html] I see that the EC2 m3.xlarge default for yarn.nodemanager.resource.memory-mb default to 11520MB and 5760MB with HBase installed. I'm not using HBase, but I believe it is installed on my cluster. Would removing HBase free up lots of memory? Is that yarn.nodemanager.resource.memory-mbsetting the most relevant setting for available memory?

When I tell spark-submit --executor-memory is that per core or for the whole worker?

When I get the error Required executor memory (2048+4096 MB), the first value (2048) is what I pass to --executor-memory and I can change it and see the error message change accordingly. What is the second 4096MB value? How can I change that? Should I change that?

I tried to post this issue to AWS developer forum (https://forums.aws.amazon.com/forum.jspa?forumID=52) and I get the error "Your message quota has been reached. Please try again later." when I haven't even posted anything? Why would I not have permissions to post a question there?

spark-submit executor-memory issue on Amazon EMR 5.0

Answers (1)

Related Questions