Ramprakash
Ramprakash

Reputation: 181

Spark Driver Memory calculation

I know how to calculate executor cores and memory.But Can anyone explain on what basis spark.driver.memory is calculated ?

Upvotes: 3

Views: 7105

Answers (2)

ryandam
ryandam

Reputation: 714

Operations on Datasets such as collect take require moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.

You increase spark.driver.memory when you collect large volumes to the driver.

As per

High Performance Spark by Holden Karau and Rachel Warren (O’Reilly)

most of the computational work of a Spark query is performed by the executors, so increasing the size of the driver rarely speeds up a computation. However, jobs may fail if they collect too much data to the driver or perform large local computations. Thus, increasing the driver memory and correspondingly the value of spark.driver.maxResultSize may prevent the out-of-memory errors in the driver.

A good heuristic for setting the Spark driver memory is simply the lowest possible value that does not lead to memory errors in the driver, i.e., which gives the maximum possible resources to the executors.

Upvotes: 4

Driss NEJJAR
Driss NEJJAR

Reputation: 978

Spark driver memory is the amount of memory to use for the driver process, i.e. the process running the main() function of the application and where SparkContext is initialized, in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") (e.g. 512m, 2g).

JVM memory is divided into separate parts. At broad level, JVM Heap memory is physically divided into two parts – Young Generation and Old Generation.

Young generation is the place where all the new objects are created. When young generation is filled, garbage collection is performed. This garbage collection is called Minor GC.

Old Generation memory contains the objects that are long lived and survived after many rounds of Minor GC. Usually garbage collection is performed in Old Generation memory when it’s full. Old Generation Garbage Collection is called Major GC and usually takes longer time.

Java Garbage Collection is the process to identify and remove the unused objects from the memory and free space to be allocated to objects created in the future processing. One of the best feature of java programming language is the automatic garbage collection, unlike other programming languages such as C where memory allocation and deallocation is a manual process.

Garbage Collector is the program running in the background that looks into all the objects in the memory and find out objects that are not referenced by any part of the program. All these unreferenced objects are deleted and space is reclaimed for allocation to other objects.

Sources:

https://spark.apache.org/docs/latest/configuration.html

https://www.journaldev.com/2856/java-jvm-memory-model-memory-management-in-java#java-memory-model-8211-permanent-generation

Upvotes: 0

Related Questions