Reputation: 3599
What happens to a persisted Spark RDD once the spark job gets finished with Success?
Do we need to explicitly write some Code to unpersist as well?
or
Does unpersisting happens automatically for each persisted RDD?
Upvotes: 1
Views: 972
Reputation: 300
Do we need to explicitly write some Code to unpersist as well?
Yes
Does unpersisting happens automatically for each persisted RDD?
No, You need to do it explicitly by calling
RDD.unpersist()
or
df1.unpersist()
and always unpersist the df after the end of the lineage, after the last action that involves the persisted/cached df.
Upvotes: 2
Reputation: 849
The official documentation of spark says
Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method.
Please take a look at http://spark.apache.org/docs/latest/programming-guide.html#removing-data
Upvotes: 1