chenzhongpu
chenzhongpu

Reputation: 6871

Does master node execute actual tasks in Spark?

My question may sound silly, but it bothers me for a long time.

enter image description here

The picture shown above is the components of a distributed Spark application. I think this picture indicates that the master node will never execute actual tasks, but only is served as a cluster manager. Is it true?

By the way, the tasks here refers to the user-submit tasks.

Upvotes: 3

Views: 1794

Answers (2)

loneStar
loneStar

Reputation: 4010

To explain a bit more on the different roles:

The driver prepares the context and declares the operations on the data using RDD transformations and actions.

The driver submits the serialized RDD graph to the master. The master creates tasks out of it and submits them to the workers for execution. It coordinates the different job stages.

The workers is where the tasks are actually executed. They should have the resources and network connectivity required to execute the operations requested on the RDDs.

Upvotes: 1

Sim
Sim

Reputation: 13528

Yes, the master node executes the driver process and does not run tasks. Tasks run in executor processes on the worker nodes. The master node is rarely stressed from a CPU standpoint but, depending on how broadcast variables, accumulators and collect are used, it may be quite stressed in terms of RAM usage.

Upvotes: 5

Related Questions