Pavel Orekhov
Pavel Orekhov

Reputation: 2194

Why is persist used like an action in Holden Karau's book "Learning Spark"?

I'm reading "Learning spark", and noticed this kind of code:

val result = input.map(x => x * x)
result.persist(StorageLevel.DISK_ONLY)
println(result.count())
println(result.collect().mkString(","))

does this code really persist the result rdd? I thought in Spark everything was immutable, but in this case it looks like we are mutating the result rdd.

Shouldn't this piece of code be written like this? :

val result = input.map(x => x * x)
val persistedResult = result.persist(StorageLevel.DISK_ONLY)
println(persistedResult.count())
println(persistedResult.collect().mkString(","))

There are many more code samples like this in the book, so that got me wondering...

Upvotes: 3

Views: 65

Answers (1)

mazaneicha
mazaneicha

Reputation: 9425

Unlike typed transformations, persist() is applied to this dataset. This is because persist actually only marks dataset as such. From Spark Programming Guide:

You can mark an RDD to be persisted using the persist() or cache() methods on it. The first time it is computed in an action, it will be kept in memory on the nodes.

Upvotes: 2

Related Questions