Reputation: 2204
I have written a program where I am persisting the RDD inside spark stream so that once new RDD come from spark stream I can join the previously cached RDDs with the new one. Is there a way to set the time to live for this persisted RDDs, so that I can make sure I am not joining the RDDs which I have already got in the last stream cycle.
Also it will be great if someone can explain and point to how once persistance in RDDs work, like when I get the persisted RDDs from spark context how can I join these RDDs across my present RDDs.
Upvotes: 1
Views: 1878
Reputation: 37435
In Spark Streaming, the time-to-live of an RDD generated by the Streaming process is controlled by the spark.cleaner.ttl
configuration. It defaults to infinite but for it to take any effect, we also need to set spark.streaming.unpersist
to false, in order for Spark streaming to 'let live' the RDDs generated.
Note that there's no per-RDD ttl possible.
Upvotes: 1