Reputation: 57
My question is more related to memory management and GC in sprak internally.
If I will create a RDD, how long it will leave in my Executor memory.
# Program Starts
spark = SparkSession.builder.appName("").master("yarn").getOrCreate()
df = spark.range(10)
df.show()
# other Operations
# Program end!!!
Upvotes: 0
Views: 1367
Reputation: 933
"how long it will leave in my Executor memory."
In this particular case spark will no materialize the full dataset ever, instead it will iterate through one by one. Only a few operators materialize the full dataset. This includes, sorts/joins/groupbys/writes/etc
"Will it be automatically deleted once my Execution finishes."
spark automatically cleans any temp data.
"If Yes, Is there any way to delete it manually during program execution."
spark only keeps that data around if its in use or has been manually persisted. what are you trying to accomplish in particular?
"How and when Garbage collection called in Spark."
Spark runs on the JVM and the JVM with automatically GC when certain metrics are hit.
Upvotes: 1
Reputation: 11449
https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html
https://spark.apache.org/docs/2.2.0/tuning.html#memory-management-overview
Upvotes: 1