Mahsa
Mahsa

Reputation: 1550

How to uncache RDD?

I used cache() to cache the data in memory but I realized to see the performance without cached data I need to uncache it to remove data from memory:

rdd.cache();
//doing some computation
...
rdd.uncache()

but I got the error said:

value uncache is not a member of org.apache.spark.rdd.RDD[(Int, Array[Float])]

I don't know how to do the uncache then!

Upvotes: 35

Views: 36861

Answers (4)

Sankar
Sankar

Reputation: 413

If you want to remove all the cached RDDs, use this ::

for ((k,v) <- sc.getPersistentRDDs) {
  v.unpersist()
}

Upvotes: 9

Josh Rosen
Josh Rosen

Reputation: 13821

RDD can be uncached using unpersist()

rdd.unpersist()

source

Upvotes: 60

Anupam Mahapatra
Anupam Mahapatra

Reputation: 833

If you cache the source data in a RDD by using .cache() or You have declared small memory. or the default memory is used and its about 500 MB for me. and you are running the code again and again,

Then this error occurs. Try clearing all RDD at the end of the code, thus each time the code runs, the RDD is created and also cleared from memory.

Do this by using: RDD_Name.unpersist()

Upvotes: 5

eliasah
eliasah

Reputation: 40370

The uncache function doesn't exist. I think that you were looking for unpersist. Which according to the Spark ScalaDoc mark the RDD as non-persistent, and remove all blocks for it from memory and disk.

Upvotes: 12

Related Questions