Simon
Simon

Reputation: 177

Resource Allocation with Spark and Yarn

I am using Zeppelin 0.7.3 with Spark 2.3 in yarn-client mode. My settings are:

Spark:

spark.driver.memory             4096m
spark.driver.memoryOverhead         3072m
spark.executor.memory           4096m
spark.executor.memoryOverhead           3072m
spark.executor.cores                3
spark.executor.instances            3

Yarn:

Minimum allocation: memory:1024, vCores:2
Maximum allocation: memory:9216, vCores:6
The application started by Zeppelin gets the following resources:
Running Containers      4
Allocated CPU VCores        4
Allocated Memory MB 22528

Yarn allocation

  1. I don't quite understand the amount of memory allocated by yarn. Given the settings, I would assume yarn would reserve (4096+3072)*4m = 28672m. However, it looks like the spark.executor.memoryOverhead option is ignored (I also tried spark.yarn.executor.memoryOverhead with no effect). Therefore, the minimum of 384m is allocated as overhead. As the minimum allocation is set to 1024m, we end up with (4096+3072)*1m + (4096+1024)*3m=22528m, where the first term is the driver and the second term sums up the executor memory.

  2. Why are only 4 CPU VCores allocated, even though I requested more and minimum allocation is set to 2 and I requested more cores? When looking the Application Master, I find the following executors:

Spark allocation

Here, the executors indeed have 3 cores each. How do I know which value is the correct one or what am I missing?

  1. I tried a couple of settings and in yarn-client mode I am supposed to use options such as spark.yarn.am.memory or spark.yarn.am.cores. However, it seems like those are ignored by yarn. Why is this the case? Additionally, in yarn-client mode, the driver is supposed to run outside of yarn. Why are the resources still allocated within yarn? My Zeppelin is running on the same machine as one of the workers.

Upvotes: 2

Views: 1561

Answers (1)

Xiaoxiang Yu
Xiaoxiang Yu

Reputation: 115

One spark application has three roles: driver, application-master, and executor.

  1. In client mode(one of deploy mode), driver itself do not ask resource from yarn, so we have one application-master, three executors which resource must be allocated by YARN. So I think spark will ask for (4G + 3G) * 3 for three executors, and 1G for am. So Allocated Memory will by 22GB(22528MB).

  2. As for core number, I think Spark UI give the correct answer because my experience.

Upvotes: 1

Related Questions