Reputation: 6871
My question may sound silly, but it bothers me for a long time.
The picture shown above is the components of a distributed Spark application. I think this picture indicates that the master node will never execute actual tasks, but only is served as a cluster manager. Is it true?
By the way, the tasks
here refers to the user-submit tasks.
Upvotes: 3
Views: 1794
Reputation: 4010
To explain a bit more on the different roles:
The driver prepares the context and declares the operations on the data using RDD transformations and actions.
The driver submits the serialized RDD graph to the master. The master creates tasks out of it and submits them to the workers for execution. It coordinates the different job stages.
The workers is where the tasks are actually executed. They should have the resources and network connectivity required to execute the operations requested on the RDDs.
Upvotes: 1
Reputation: 13528
Yes, the master node executes the driver process and does not run tasks. Tasks run in executor processes on the worker nodes. The master node is rarely stressed from a CPU standpoint but, depending on how broadcast variables, accumulators and collect
are used, it may be quite stressed in terms of RAM usage.
Upvotes: 5