Frank Cheng
Frank Cheng

Reputation: 6196

Do we need to call cache for single action?

For example, i got one spark session and this session only contains one action and lots of transformations. And no partition will failed during the task executets. So does cache is unnecessary in this case ? Because cache is used for share rdd between actions.

Upvotes: 0

Views: 110

Answers (2)

philantrovert
philantrovert

Reputation: 10092

You pretty much answered your own question.

cache will only get into effect after at least one action is called on the RDD you cached. This means that the entire DAG of the RDD needs to be computed from scratch at least once.

Since you only have one action, cache will not do anything. Except eating up your executor memory.

Upvotes: 3

petertc
petertc

Reputation: 3921

No, you do not need to call cache() in your case.

Upvotes: 1

Related Questions