Y0gesh Gupta
Y0gesh Gupta

Reputation: 2204

how long can RDDs be persisted in spark

I have written a program where I am persisting the RDD inside spark stream so that once new RDD come from spark stream I can join the previously cached RDDs with the new one. Is there a way to set the time to live for this persisted RDDs, so that I can make sure I am not joining the RDDs which I have already got in the last stream cycle.

Also it will be great if someone can explain and point to how once persistance in RDDs work, like when I get the persisted RDDs from spark context how can I join these RDDs across my present RDDs.

Upvotes: 1

Views: 1878

Answers (1)

maasg
maasg

Reputation: 37435

In Spark Streaming, the time-to-live of an RDD generated by the Streaming process is controlled by the spark.cleaner.ttl configuration. It defaults to infinite but for it to take any effect, we also need to set spark.streaming.unpersist to false, in order for Spark streaming to 'let live' the RDDs generated.

Note that there's no per-RDD ttl possible.

Upvotes: 1

Related Questions