Reputation: 184
My cluster configuration is as below :- 7 Nodes each with 32 cores and 252 GB of memory.
The yarn configurations are as below :-
yarn.scheduler.maximum-allocation-mb - 10GB
yarn.scheduler.minimum-allocation-mb - 2GB
yarn.nodemanager.vmem-pmem-ratio - 2.1
yarn.nodemanager.resource.memory-mb - 22GB
yarn.scheduler.maximum-allocation-vcores - 25
yarn.scheduler.minimum-allocation-vcores - 1
yarn.nodemanager.resource.cpu-vcores - 25
The map reduce configurations are as below :-
mapreduce.map.java.opts - -Xmx1638m
mapreduce.map.memory.mb - 2GB
mapreduce.reduce.java.opts - -Xmx3276m
mapreduce.reduce.memory.mb - 4Gb
The spark configurations are as :-
spark.yarn.driver.memoryOverhead 384
spark.yarn.executor.memoryOverhead 384
Now I tried running the spark-shell by setting values as master yarn and different values for executor-memory, num-executors, executor-cores.
In this case the executor memory + 384 cannot exceed 10GB max for a yarn scheduler. So in this case 9856M + 384 MB = 10GB so it works fine. Now once the spark shell is up, the total number of executors were 124 instead of requtesed 175. The Storage memory as seen in spark shell start logs or Spark UI for each executor is 6.7 GB(i.e. 67% of 10GB).
The top command output for the spark shell process is as below:-
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
8478 hdp66-ss 20 0 13.5g 1.1g 25m S 1.9 0.4 2:11.28
So virtual memory is 13.5G and physical memory is 1.1g
In this case the executor memory + 384 cannot exceed 10GB max for a yarn scheduler. So in this case 9856M + 384 MB = 10GB so it works fine. Now once the spark shell is up, the total number of executors were 35. The Storage memory as seen in spark shell start logs or Spark UI for each executor is 6.7 GB(i.e. 67% of 10GB).
The top command output for the spark shell process is as below:-
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
5256 hdp66-ss 20 0 13.2g 1.1g 25m S 2.6 0.4 1:25.25
So virtual memory is 13.2G and physical memory is 1.1g
In this case the executor memory + 384 cannot exceed 10GB max for a yarn scheduler. So in this case 4096M + 384 MB = 4GB so it works fine. Now once the spark shell is up, the total number of executors were 200. The Storage memory as seen in spark shell start logs or Spark UI for each executor is 2.7 GB(i.e. 67% of 4GB).
The top command output for the spark shell process is as below:-
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
21518 hdp66-ss 20 0 19.2g 1.4g 25m S 3.9 0.6 2:24.46
So virtual memory is 19.2G and physical memory is 1.4g.
So can someone please explain me how these memories and executors are started. Why the memory seen on spark UI is 67% of the executor memory requetsed? And how the virtual and physical memory is decided for each executor.
Upvotes: 4
Views: 2022
Reputation: 1631
Spark almost always allocates 65% to 70% of the memory requested for the executors by a user. This behavior of Spark is due to a SPARK JIRA TICKET "SPARK-12579".
if (conf.contains("spark.executor.memory")) {
val executorMemory = conf.getSizeAsBytes("spark.executor.memory")
if (executorMemory < minSystemMemory) {
throw new IllegalArgumentException(s"Executor memory $executorMemory must be at least " +
s"$minSystemMemory. Please increase executor memory using the " +
s"--executor-memory option or spark.executor.memory in Spark configuration.")
}
}
val usableMemory = systemMemory - reservedMemory
val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6)
(usableMemory * memoryFraction).toLong
}
The above code is responsible for the behavior seen by you. This is a safe guard for a scenario where the cluster may not have memory as requested by the user.
Upvotes: 2