Reputation: 181
I know how to calculate executor cores and memory.But Can anyone explain on what basis spark.driver.memory is calculated ?
Upvotes: 3
Views: 7105
Reputation: 714
Operations on Dataset
s such as collect
take
require moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.
You increase spark.driver.memory
when you collect large volumes to the driver.
As per
High Performance Spark by Holden Karau and Rachel Warren (O’Reilly)
most of the computational work of a Spark query is performed by the executors, so increasing the size of the driver rarely speeds up a computation. However, jobs may fail if they collect too much data to the driver or perform large local computations. Thus, increasing the driver memory and correspondingly the value of
spark.driver.maxResultSize
may prevent the out-of-memory errors in the driver.A good heuristic for setting the Spark driver memory is simply the lowest possible value that does not lead to memory errors in the driver, i.e., which gives the maximum possible resources to the executors.
Upvotes: 4
Reputation: 978
Spark driver memory is the amount of memory to use for the driver process, i.e. the process running the main() function of the application and where SparkContext is initialized, in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") (e.g. 512m, 2g).
JVM memory is divided into separate parts. At broad level, JVM Heap memory is physically divided into two parts – Young Generation and Old Generation.
Young generation is the place where all the new objects are created. When young generation is filled, garbage collection is performed. This garbage collection is called Minor GC.
Old Generation memory contains the objects that are long lived and survived after many rounds of Minor GC. Usually garbage collection is performed in Old Generation memory when it’s full. Old Generation Garbage Collection is called Major GC and usually takes longer time.
Java Garbage Collection is the process to identify and remove the unused objects from the memory and free space to be allocated to objects created in the future processing. One of the best feature of java programming language is the automatic garbage collection, unlike other programming languages such as C where memory allocation and deallocation is a manual process.
Garbage Collector is the program running in the background that looks into all the objects in the memory and find out objects that are not referenced by any part of the program. All these unreferenced objects are deleted and space is reclaimed for allocation to other objects.
Sources:
https://spark.apache.org/docs/latest/configuration.html
Upvotes: 0