Reputation: 399
Sparks transformations have to be triggered by calling actions. What does Spark exactly do if no action is called? And which parts or processes are involved in processing a lazy operation (e.g. transformation) before the triggering of its execution?
Upvotes: 2
Views: 1989
Reputation: 74619
tl;dr Spark does almost nothing (given what it does in general).
Applying transformations creates a RDD lineage, i.e. a DAG of RDDs. That's how an RDD can meet the R in its name - being resilient and be able to recover in case of missing map outputs. No execution happens on executors, no serialization, sending over the wire, or similar network-related activity. All it does is to create new RDDs out of existing ones building a graph of RDDs.
Every transformation call returns a new RDD. You start with a SparkContext and build a "pipeline" applying transformations.
It's only when an action is called to submit a job when DAGScheduler transforms RDDs into stages of TaskSets/TaskSetManagers that in turn are going to be executed as parallel tasks on executors.
p.s. A couple of transformations, however, trigger a job like sortBy
or zipWithIndex
. See https://issues.apache.org/jira/browse/SPARK-1021.
Upvotes: 4
Reputation: 31526
My understanding is that before any action is called, Spark is only building the DAG.
Its when you call an Action, it executes the DAG which it has been building so far.
So if you don't call an action, no processing is done. its only building the DAG.
Upvotes: 0