I would like to get a better understanding of the communication exchange between YARN and Spark. For example: What happens from the moment a Spark job is triggered until the allocation of the resources by YARN? What happens when the Spark job requests for resources more than that are available with YARN? What happens when the Spark job requests for resources more than the cluster capacity?

Reputation: 41

Better understanding of communication between YARN and Spark

I would like to get a better understanding of the communication exchange between YARN and Spark. For example:

What happens from the moment a Spark job is triggered until the allocation of the resources by YARN?
What happens when the Spark job requests for resources more than that are available with YARN?
What happens when the Spark job requests for resources more than the cluster capacity?

Upvotes: 1

Reputation: 2224

Steps done when we run spark-submit on Yarn client mode -

Spark driver internally invokes Client class submitApplication method. This submits a Spark application to a YARN cluster (i.e. to the YARN ResourceManager) and returns the application’s ApplicationId.
After this, spark uses the application_id generated in step 1 and calls createContainerLaunchContext method. This method creates a YARN ContainerLaunchContext request for YARN NodeManager to launch ApplicationMaster (in a container).
Step 2 is responsible for launching an ApplicationMaster for the application. If the cluster dont have resources to start an AM, then it will fail and driver will shut down with an exception. Once the AM is up and running, it contacts the driver and that it is up. At this point the spark yarn application is UP and running.
After this driver asks for resources (executors) to AM which then asks the same to Yarn ResourceManager.
If the yarn doesn't have that much capacity, it will give whatever is possible to the Spark Application. If it has capacity, it will give whatever is asked for.

Upvotes: 2