What is the purpose of uber mode in Hadoop?

Question

New to Hadoop here. When a job is running in uber mode, the ApplicationMaster does not request containers from the ResourceManager. Instead, the AM, which is running on a single node, just executes the entire job on its own process. This is advantageous because it reduces the overhead of having to deal with the RM.

What I don't understand: If a job is small enough to be completed in a reasonable amount of time on a single node, what is the point of submitting a MapReduce job in the first place? MapReduce speeds up computation by allowing computation to be performed in parallel across multiple machines. If we only intend to use one node, why not just write a regular program and run it on our local machines?

Binary Nerd · Accepted Answer

Perhaps some reasons might be:

You have a reusable process that can scale up if needed, in which case it might start using more slots and not run in uber mode.
Keeping things simple. Its unlikely you would write that one job, typically you will have many which process varying amounts of data. Why change things and choose a specific job to process the data using a different method.
A program running outside of MapReduce would likely loose a number of the additional benefits provided by the framework, such as failure recovery.

What is the purpose of uber mode in Hadoop?

Answers (2)

Related Questions