Reputation: 1193
I have a 5 worker node cluster with 6 GB of memory each (Spark executor memory is set at 4608 GB).
I have been running out of memory, with Spark telling me that one of my executors was trying to use more that 5.0 GB of memory. If each executor gets 5 GB of memory, then I should have 25 GB overall of memory between my entire cluster.
ExecutorLostFailure (executor 4 exited caused by one of the running tasks)
Reason: Container killed by YARN for exceeding memory limits. 5.0 GB of 5.0
GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
At the beginning of my spark application, when I look at one of my RDDs in the Storage tab (it is the only rdd in the cache at this point), I see:
RDD Name Storage Level Cached Partitions Fraction Cached Size in Memory Size on Disk
myRDD Memory Serialized 1x Replicated 20 100% 3.2 GB 0.0 B
Host On Heap Memory Usage Off Heap Memory Usage Disk Usage
Node 1 643.5 MB (1931.3 MB Remaining) 0.0 B (0.0 B Remaining) 0.0 B
Master 0.0 B (366.3 MB Remaining) 0.0 B (0.0 B Remaining) 0.0 B
Node 2 654.8 MB (1920.0 MB Remaining) 0.0 B (0.0 B Remaining) 0.0 B
Node 3 644.2 MB (1930.6 MB Remaining) 0.0 B (0.0 B Remaining) 0.0 B
Node 4 656.2 MB (1918.6 MB Remaining) 0.0 B (0.0 B Remaining) 0.0 B
Node 5 652.4 MB (1922.4 MB Remaining) 0.0 B (0.0 B Remaining) 0.0 B
This seems to show that each node only has about 2.5 GB of available memory. The storage tab also never gets close to showing 25 GB of cached RDDs before my spark application gets an out of memory error.
How can I find out how much memory is allocated for cached RDDs?
Upvotes: 2
Views: 1002
Reputation: 10082
While submitting a job, you can specify the parameter spark.memory.storageFraction
. The default value for this is 0.5.
So, in your case where you are allocating 5G memory for executors, 2.5G will be reserved for caching and the remaining 2.5G will be used for execution.
From Memory Management:
spark.memory.storageFraction
Amount of storage memory immune to eviction, expressed as a fraction of the size of the region set aside by spark.memory.fraction. The higher this is, the less working memory may be available to execution and tasks may spill to disk more often. Leaving this at the default value is recommended. For more detail, see this description.
Upvotes: 2