Spark Yarn Memory configuration

Question

I have a spark application that keeps failing on error:

"Diagnostics: Container [pid=29328,containerID=container_e42_1512395822750_0026_02_000001] is running beyond physical memory limits. Current usage: 1.5 GB of 1.5 GB physical memory used; 2.3 GB of 3.1 GB virtual memory used. Killing container."

I saw lots of different parameters that was suggested to change to increase the physical memory. Can I please have the some explanation for the following parameters?

mapreduce.map.memory.mb (currently set to 0 so suppose to take the default which is 1GB so why we see it as 1.5 GB, changing it also dint effect the number)
mapreduce.reduce.memory.mb (currently set to 0 so suppose to take the default which is 1GB so why we see it as 1.5 GB, changing it also dint effect the number)
mapreduce.map.java.opts/mapreduce.reduce.java.opts set to 80% form the previous number
yarn.scheduler.minimum-allocation-mb=1GB (when changing this then I see the effect on the max physical memory, but for the value 1 GB it still 1.5G)
yarn.app.mapreduce.am.resource.mb/spark.yarn.executor.memoryOverhead can't find at all in configuration.

We are defining YARN (running with yarn-cluster deployment mode) using cloudera CDH 5.12.1.

Ryan Widmaier · Accepted Answer

spark.driver.memory
spark.executor.memory

These control the base amount of memory spark will try to allocate for it's driver and for all the executors. These are probably the ones you want to increase if you are running out of memory.

// options before Spark 2.3.0
spark.yarn.driver.memoryOverhead
spark.yarn.executor.memoryOverhead

// options after Spark 2.3.0
spark.driver.memoryOverhead
spark.executor.memoryOverhead

This value is an additional amount of memory to request when you are running Spark on yarn. It is intended to account extra RAM needed for the yarn container that is hosting your Spark Executors.

yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb

When Spark goes to ask Yarn to reserve a block of RAM for an executor, it will ask a value of the base memory plus the overhead memory. However, Yarn may not give it back one of exactly that size. These parameters control the smallest container size and the largest container size that YARN will grant. If you are only using the cluster for one job, I find it easiest to set these to very small and very large values and then using the spark memory settings mentions above to set the true container size.

mapreduce.map.memory.mb
mapreduce.map.memory.mb
mapreduce.map.java.opts/mapreduce.reduce.java.opts

I don't think these have any bearing on your Spark/Yarn job.

Spark Yarn Memory configuration

Answers (1)

Related Questions