andret8
andret8

Reputation: 286

Spark: use of driver-memory parameter

When I submit this command, my job failed with error "Container is running beyond physical memory limits".

spark-submit --master yarn --deploy-mode cluster --executor-memory 5G --total-executor-cores 30 --num-executors 15 --conf spark.yarn.executor.memoryOverhead=1000

But adding the parameter: --driver-memory to 5GB (or upper), the job ends without error.

spark-submit --master yarn --deploy-mode cluster --executor-memory 5G --total executor-cores 30 --num-executors 15 --driver-memory 5G --conf spark.yarn.executor.memoryOverhead=1000

Cluster info: 6 nodes with 120GB of Memory. YARN Container Memory Minimum: 1GB

The question is: what is the difference in using or not this parameter?

Upvotes: 2

Views: 5390

Answers (1)

Prashant
Prashant

Reputation: 772

If increasing the driver memory is helping you to successfully complete the job then it means that driver is having lots of data coming into it from executors. Typically, the driver program is responsible for collecting results back from each executor after the tasks are executed. So, in your case it seems that increasing the driver memory helped to store more results back into the driver memory.

If you read the some points on executor memory, driver memory and the way Driver interacts with executors then you will get better clarity on the situation you are in.

Hope it helps to some extent.

Upvotes: 2

Related Questions