CuriousMind
CuriousMind

Reputation: 8903

How YARN decides which type of Application master to launch?

I referred to this link and got fair understanding on how YARN works. YARN is capable of having multi-tenant applications to run, for example, MR, Spark etc.

The key point is the Application specific ApplicationMaster (AM).

When a client submits Job to Resource Manager, how does Resource Manager know what kind of application it is (MR, Spark) and consequently launching appropriate ApplicationMaster?

Can anyone help how RM comes to know what kind of Job is being submitted to it?

EDIT:

This question is: How does RM knows what kind of Job has been submitted and not any relationship between YARN or MR or Spark.

RM receives a Job, so it has to launch a first Container which runs application specific ApplicationMaster, hence how does RM knows what kind of Job has been submitted to it?

This is the question I am asking, and this is not same what it has been made to be duplicate of.

Upvotes: 2

Views: 1045

Answers (1)

ernest_k
ernest_k

Reputation: 45319

YARN does not need/want to know about the type of application running on it. It provides resources and it's the concern of the application running on it to understand how to obtain resources from YARN in order to run what it needs to run (YARN's architecture does not suggest that yarn wants to know what/how tasks run on it).

There's more information here on how to write components that integrate with yarn.

As I understand from the 2-step YARN application writing, one needs to write a YARN client as well as a YARN Application master.

  • An application client determines what to run as application master:

    // Construct the command to be executed on the launched container 
    String command = 
        "${JAVA_HOME}" + /bin/java" +
        " MyAppMaster" + 
        " arg1 arg2 arg3" +
        ...
    

    Where MyAppMaster is the application-specific master class.

  • The second thing is the task that runs in the container, note the kind of command that is provided by the application master to run the container (which runs the actual task executors):

    // Set the necessary command to execute on the allocated container 
    String command = "/bin/sh ./MyExecShell.sh";
    

As you can see, these are application-provided code that know about the task (or type of application to use the question's words). Further down on the same page, you can see how applications can be submitted to yarn.

Now to put that in the perspective of Spark: Spark has its own application master class (check here or the entire package). These are hidden from the Spark application developer because the framework provides built-in integration with YARN, which happens to be just one of Spark's supported resources managers.

If you were to write your own YARN client that executes, say, Python code, then you'd have to follow the steps on the YARN application client/master documentation steps in order to supply YARN with the commands, configuration, resources, and environment to be used to execute your application's specific logic or tasks.

Upvotes: 3

Related Questions