Apache Spark what am I persisting here?

Question

In this line, which RDD is being persisted? dropResultsN or dataSetN?

dropResultsN = dataSetN.map(s -> standin.call(s)).persist(StorageLevel.MEMORY_ONLY());

Question arises as a side issue from Apache Spark timing forEach operation on JavaRDD, where I am still looking for a good answer to the core question of how best to time RDD creation.

jaco0646 · Accepted Answer

dropResultsN is the persisted RDD (which is the RDD produced by mapping dataSetN onto the method standin.call()).

Answers (2)