Reputation: 1304
In this line, which RDD is being persisted? dropResultsN or dataSetN?
dropResultsN = dataSetN.map(s -> standin.call(s)).persist(StorageLevel.MEMORY_ONLY());
Question arises as a side issue from Apache Spark timing forEach operation on JavaRDD, where I am still looking for a good answer to the core question of how best to time RDD creation.
Upvotes: 0
Views: 70
Reputation: 1304
I found a good example of this in Learning Spark by O'Reilly:
It's example 3-40. persist() in Scala (assuming Java is the same)
import org.apache.spark.storage.StorageLevel
val result = input.map( x => x*x )
result.persist(StorageLevel.[<your choice>][1])
NOTE in Learning Spark: Notice that we called persist() on the RDD before the first action. The persist() call on its own doesn't force evaluation.
MY NOTE that in this example the persist is on the next line, I think this is much more clear than my code in my question.
Upvotes: 0
Reputation: 17104
dropResultsN
is the persisted RDD (which is the RDD produced by mapping dataSetN
onto the method standin.call()
).
Upvotes: 1