Why is persist used like an action in Holden Karau's book "Learning Spark"?

Question

I'm reading "Learning spark", and noticed this kind of code:

val result = input.map(x => x * x)
result.persist(StorageLevel.DISK_ONLY)
println(result.count())
println(result.collect().mkString(","))

does this code really persist the result rdd? I thought in Spark everything was immutable, but in this case it looks like we are mutating the result rdd.

Shouldn't this piece of code be written like this? :

val result = input.map(x => x * x)
val persistedResult = result.persist(StorageLevel.DISK_ONLY)
println(persistedResult.count())
println(persistedResult.collect().mkString(","))

There are many more code samples like this in the book, so that got me wondering...

mazaneicha · Accepted Answer

Unlike typed transformations, persist() is applied to this dataset. This is because persist actually only marks dataset as such. From Spark Programming Guide:

You can mark an RDD to be persisted using the persist() or cache() methods on it. The first time it is computed in an action, it will be kept in memory on the nodes.

Why is persist used like an action in Holden Karau's book "Learning Spark"?

Answers (1)

Related Questions

Why is persist used like an action in Holden Karau&#39;s book &quot;Learning Spark&quot;?

Answers (1)

Related Questions

Why is persist used like an action in Holden Karau's book "Learning Spark"?