lary
lary

Reputation: 399

What does Spark actually do before action is called?

Sparks transformations have to be triggered by calling actions. What does Spark exactly do if no action is called? And which parts or processes are involved in processing a lazy operation (e.g. transformation) before the triggering of its execution?

Upvotes: 2

Views: 1989

Answers (2)

Jacek Laskowski
Jacek Laskowski

Reputation: 74619

tl;dr Spark does almost nothing (given what it does in general).

Applying transformations creates a RDD lineage, i.e. a DAG of RDDs. That's how an RDD can meet the R in its name - being resilient and be able to recover in case of missing map outputs. No execution happens on executors, no serialization, sending over the wire, or similar network-related activity. All it does is to create new RDDs out of existing ones building a graph of RDDs.

Every transformation call returns a new RDD. You start with a SparkContext and build a "pipeline" applying transformations.

It's only when an action is called to submit a job when DAGScheduler transforms RDDs into stages of TaskSets/TaskSetManagers that in turn are going to be executed as parallel tasks on executors.

p.s. A couple of transformations, however, trigger a job like sortBy or zipWithIndex. See https://issues.apache.org/jira/browse/SPARK-1021.

Upvotes: 4

Knows Not Much
Knows Not Much

Reputation: 31526

My understanding is that before any action is called, Spark is only building the DAG.

Its when you call an Action, it executes the DAG which it has been building so far.

So if you don't call an action, no processing is done. its only building the DAG.

Upvotes: 0

Related Questions