Reputation: 2977
I'm running a Spark cluster in a 1 MasterNode, 3 WorkerNode configuration using aws emr and YARN-client, with the MasterNode being the client machine. All 4 nodes have 8GB of memory and 4 cores each. Given that hardware setup, I set the following:
spark.executor.memory = 5G
spark.executor.cores = 3
spark.yarn.executor.memoryOverhead = 600
With that configuration, would the expected Total Memory
recognized by Yarn's ResourceManager be 15GB? It's displaying 18GB. I've only seen Yarn use up to 15GB when running Spark applications. Is that 15GB from the spark.executor.memory * 3 nodes
?
I want to assume that the YARN Total Memory is calculated by spark.executor.memory + spark.yarn.executor.memoryOverhead
but I can't find that documented anywhere. What's the proper way to find the exact number?
And I should be able to increase the value of spark.executor.memory
to 6G
right? I've gotten errors in the past when it was set like that. Would there be other configurations I need to set?
Edit- So it looks like the workerNodes' value for yarn.scheduler.maximum-allocation-mb
is 6114
or 6GB. This is the default that EMR sets for the instance type. And since 6GB * 3 = 18GB, that likely makes sense. I want to restart Yarn and increase that value from 6GB to 7GB, but can't since this is a cluster being used, so I guess my question still stands.
Upvotes: 4
Views: 1819
Reputation: 11583
I want to assume that the YARN Total Memory is calculated by spark.executor.memory + spark.yarn.executor.memoryOverhead but I can't find that documented anywhere. What's the proper way to find the exact number?
This is sort of correct, but said backwards. YARN's total memory is independent of any configurations you set up for Spark. yarn.scheduler.maximum-allocation-mb
controls how much memory YARN has access to, and can be found here. To use all available memory with Spark, you would set spark.executor.memory
+ spark.yarn.executor.memoryOverhead
to equal yarn.scheduler.maximum-allocation-mb
. See here for more info on tuning your spark job and this spreadsheet for calculating configurations.
And I should be able to increase the value of spark.executor.memory to 6G right?
Based on the spreadsheet, the upper limit of spark.executor.memory
is 5502M
if yarn.scheduler.maximum-allocation-mb is 6114M
. Calculated by hand, this is .9 * 6114
as spark.executor.memoryOverhead
defaults to
executorMemory * 0.10
, with minimum of 384
(source)
Upvotes: 1