I am trying to write a yarn application and was hoping to get some suggestions on a few design questions I had in mind. I have gone through the simpler sample apps like distributed shell and some variations of it so I am familiar with the basic API. What I would like to do is create an application that has a web interface which user can interact with and potentially provide some kind of tasks (nature of tasks is irrelevant). Based on that work, the UI requests containers to do the processing. The ideal arrangement that comes to my mind is that my application master provides this web UI and no containers are allocated until someone comes to the AM website and requests some work. At this point, AM should be able to register new containers and allocate work to them. If the AM provides the web UI and my understanding is that AM is chosen by RM every time the application is submitted to RM. That means the AM can have a different IP and, therefore, a different URL upon application restart. Does this behavior suggests that AM should not be used for such purpose and potentially a completely different application (non-yarn) can provide the web UI and is better suited for it? In all the examples, I have seen for sample yarn apps AM requests for containers as part of its invocation. Can someone please point to the AM related APIs that allow essentially requesting containers at a later time or potentially lets say modifying the resource requirements (memory) of already claimed containers or even increasing the number of containers on demand Similar to last point, most examples focus on yarn application that do something and then end. As you can imagine my application would make sense to continue to run forever (as its a web app). For these long running application does the Client to RM api changes. Is it ok to disconnect the client submission job process or start it using & to run it in background I would appreciate any suggestions

Reputation: 2431

Design questions for long running yarn applications

I am trying to write a yarn application and was hoping to get some suggestions on a few design questions I had in mind. I have gone through the simpler sample apps like distributed shell and some variations of it so I am familiar with the basic API. What I would like to do is create an application that has a web interface which user can interact with and potentially provide some kind of tasks (nature of tasks is irrelevant). Based on that work, the UI requests containers to do the processing.

The ideal arrangement that comes to my mind is that my application master provides this web UI and no containers are allocated until someone comes to the AM website and requests some work. At this point, AM should be able to register new containers and allocate work to them.

If the AM provides the web UI and my understanding is that AM is chosen by RM every time the application is submitted to RM. That means the AM can have a different IP and, therefore, a different URL upon application restart. Does this behavior suggests that AM should not be used for such purpose and potentially a completely different application (non-yarn) can provide the web UI and is better suited for it?
In all the examples, I have seen for sample yarn apps AM requests for containers as part of its invocation. Can someone please point to the AM related APIs that allow essentially requesting containers at a later time or potentially lets say modifying the resource requirements (memory) of already claimed containers or even increasing the number of containers on demand
Similar to last point, most examples focus on yarn application that do something and then end. As you can imagine my application would make sense to continue to run forever (as its a web app). For these long running application does the Client to RM api changes. Is it ok to disconnect the client submission job process or start it using & to run it in background

I would appreciate any suggestions

Upvotes: 1

Answers (2)

insanely_sin

Reputation: 1026

Take a look at Apache Twill. Apache Twill is an abstraction over Apache Hadoop YARN that reduces the complexity of developing distributed applications. Containers can be launched executing long-running processes (servers).

The Changing IP of your container serving the Web UI can be addressed by Apache Curator. Services can be registered in the ZooKeeper using Apache Curator's Service Discovery mechanism.

Upvotes: 0

PsychoMantis

Reputation: 45

In regards to question (1) You can run your AM in the unmanaged mode. This will allow you to run the AM outside of the YARN cluster on a dedicated machine whose IP address you have more control over.

Upvotes: 0

Design questions for long running yarn applications

Answers (2)

Related Questions