Reputation: 51
I use Spark 2.0.2 (in DSE / DataStaX Enterprise Edition 5.1) for running some streaming app.
My Spark streaming app does, for each micro-batch, some calls to RDD.persist(), and the RDD.unpersist() is NEVER called (so far, we rely on LRU capabilities of the cache space for unpersisting).
I thought I would see a list of persisted RDD growing quite a bit in the "Storage" tab within the Spark UI.
However, I see only a VERY limited list of persisted RDD within this "Storage" tab of Spark UI. Let's say 10 max persisted RDD and 1,5 MB each => 15 MB occupied space for persisted RDD, quite a limited amount of space as each executor has 1,5 GB of heap.
So I wonder: Are memory-persisted RDD unpersisted at the end of a Spark streaming micro-batch ?
Thanks.
Upvotes: 0
Views: 388
Reputation: 108
Spark won't unpersist rdds at the end of the batch. GC will clean up the RAM in the LRU basis.
Upvotes: 0