Reputation: 2194
I'm reading "Learning spark", and noticed this kind of code:
val result = input.map(x => x * x)
result.persist(StorageLevel.DISK_ONLY)
println(result.count())
println(result.collect().mkString(","))
does this code really persist the result rdd
? I thought in Spark everything was immutable, but in this case it looks like we are mutating the result rdd
.
Shouldn't this piece of code be written like this? :
val result = input.map(x => x * x)
val persistedResult = result.persist(StorageLevel.DISK_ONLY)
println(persistedResult.count())
println(persistedResult.collect().mkString(","))
There are many more code samples like this in the book, so that got me wondering...
Upvotes: 3
Views: 65
Reputation: 9425
Unlike typed transformations, persist()
is applied to this dataset. This is because persist
actually only marks dataset as such. From Spark Programming Guide:
You can mark an RDD to be persisted using the persist() or cache() methods on it. The first time it is computed in an action, it will be kept in memory on the nodes.
Upvotes: 2