simplycoding
simplycoding

Reputation: 2977

How is YARN ResourceManager's Total Memory calculated?

I'm running a Spark cluster in a 1 MasterNode, 3 WorkerNode configuration using aws emr and YARN-client, with the MasterNode being the client machine. All 4 nodes have 8GB of memory and 4 cores each. Given that hardware setup, I set the following:

spark.executor.memory = 5G
spark.executor.cores = 3
spark.yarn.executor.memoryOverhead = 600

With that configuration, would the expected Total Memory recognized by Yarn's ResourceManager be 15GB? It's displaying 18GB. I've only seen Yarn use up to 15GB when running Spark applications. Is that 15GB from the spark.executor.memory * 3 nodes?

I want to assume that the YARN Total Memory is calculated by spark.executor.memory + spark.yarn.executor.memoryOverhead but I can't find that documented anywhere. What's the proper way to find the exact number?

And I should be able to increase the value of spark.executor.memory to 6G right? I've gotten errors in the past when it was set like that. Would there be other configurations I need to set?

Edit- So it looks like the workerNodes' value for yarn.scheduler.maximum-allocation-mb is 6114 or 6GB. This is the default that EMR sets for the instance type. And since 6GB * 3 = 18GB, that likely makes sense. I want to restart Yarn and increase that value from 6GB to 7GB, but can't since this is a cluster being used, so I guess my question still stands.

Upvotes: 4

Views: 1819

Answers (1)

David
David

Reputation: 11583

I want to assume that the YARN Total Memory is calculated by spark.executor.memory + spark.yarn.executor.memoryOverhead but I can't find that documented anywhere. What's the proper way to find the exact number?

This is sort of correct, but said backwards. YARN's total memory is independent of any configurations you set up for Spark. yarn.scheduler.maximum-allocation-mb controls how much memory YARN has access to, and can be found here. To use all available memory with Spark, you would set spark.executor.memory + spark.yarn.executor.memoryOverhead to equal yarn.scheduler.maximum-allocation-mb. See here for more info on tuning your spark job and this spreadsheet for calculating configurations.

And I should be able to increase the value of spark.executor.memory to 6G right?

Based on the spreadsheet, the upper limit of spark.executor.memory is 5502M if yarn.scheduler.maximum-allocation-mb is 6114M. Calculated by hand, this is .9 * 6114 as spark.executor.memoryOverhead defaults to executorMemory * 0.10, with minimum of 384 (source)

Upvotes: 1

Related Questions