LubaT
LubaT

Reputation: 129

Spark Yarn Memory configuration

I have a spark application that keeps failing on error:

"Diagnostics: Container [pid=29328,containerID=container_e42_1512395822750_0026_02_000001] is running beyond physical memory limits. Current usage: 1.5 GB of 1.5 GB physical memory used; 2.3 GB of 3.1 GB virtual memory used. Killing container."

I saw lots of different parameters that was suggested to change to increase the physical memory. Can I please have the some explanation for the following parameters?

We are defining YARN (running with yarn-cluster deployment mode) using cloudera CDH 5.12.1.

Upvotes: 3

Views: 6633

Answers (1)

Ryan Widmaier
Ryan Widmaier

Reputation: 8523

spark.driver.memory
spark.executor.memory

These control the base amount of memory spark will try to allocate for it's driver and for all the executors. These are probably the ones you want to increase if you are running out of memory.

// options before Spark 2.3.0
spark.yarn.driver.memoryOverhead
spark.yarn.executor.memoryOverhead

// options after Spark 2.3.0
spark.driver.memoryOverhead
spark.executor.memoryOverhead

This value is an additional amount of memory to request when you are running Spark on yarn. It is intended to account extra RAM needed for the yarn container that is hosting your Spark Executors.

yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb

When Spark goes to ask Yarn to reserve a block of RAM for an executor, it will ask a value of the base memory plus the overhead memory. However, Yarn may not give it back one of exactly that size. These parameters control the smallest container size and the largest container size that YARN will grant. If you are only using the cluster for one job, I find it easiest to set these to very small and very large values and then using the spark memory settings mentions above to set the true container size.

mapreduce.map.memory.mb
mapreduce.map.memory.mb
mapreduce.map.java.opts/mapreduce.reduce.java.opts

I don't think these have any bearing on your Spark/Yarn job.

Upvotes: 4

Related Questions