Prashant
Prashant

Reputation: 772

cache at each step in DAG during execution

I am running my spark SQL application and I see that the stages that are created have an execution steps in DAG where in each and every RDD that is created internally is present with cache operation. In my application I have a series of statements (eg val df1 = .....) and after doing all the transformations i do cache followed by count on the last dataframe. I am trying to understand why DAG is showing Cache for everything.DAG of a stage

Upvotes: 0

Views: 441

Answers (1)

user10207956
user10207956

Reputation: 26

It doesn't cache at every step. Persistence in the DAG visualization is denoted by a green circle.

"Cache" you see refers to the call point, which caused the job execution.

Upvotes: 1

Related Questions