Reputation: 1009
Hi I am a big data newbie. I searched all over the internet to find what exactly uber mode is. The more I searched the more I got confused. Can anybody please help me by answering my questions?
Upvotes: 32
Views: 32200
Reputation: 99
First we need to understand what happens when a job is submitted by the user.
It goes to the Resource Manager.
The Resource Manager coordinates with one of the Node Manager and creates a container on that node.
In this container an Application Master service is started, which will handle this application locally.
This application master is now responsible to request more resources for the application to the Resource Manager.
Application master will first check with the Name Node to get the
info where the data blocks are stored i.e. on which data nodes in the
cluster, data blocks are present.
After getting info about the nodes where data is present, it will
request for access of those nodes to use their resources, by
resources we mean to say containers (CPU + memory), so that data
locality is followed.
Every application has a separate application master.
After the Resource manager gives access to more resources, then on those nodes containers/executors are created. Those containers/executors are then handled by Node managers.
Uber Mode:
Sometimes what happens is the job is so small that it can run in the container where Application Master is running. So, in this case it won’t need to request for different containers.
Upvotes: 0
Reputation: 1091
Pretty good answers are given for "What is Uber Mode?" Just to add some more information for "Why?"
The application master decides how to run the tasks that make up the MapReduce job. If the job is small, the application master may choose to run the tasks in the same JVM as itself. This happens when it judges the overhead of allocating and running tasks in new containers outweighs the gain in running them in parallel, when compared to running them sequentially on one node.
Now, the questions could be raised as "What qualifies as a small job?
By default, a small job is one that has less than 10 mappers, only one reducer, and an input size that is less than the size of one HDFS block.
Upvotes: 4
Reputation: 660
What is UBER mode in Hadoop2?
Normally mappers and reducers will run by ResourceManager (RM), RM will create separate container for mapper and reducer. Uber configuration, will allow to run mapper and reducers in the same process as the ApplicationMaster (AM).
Uber jobs :
Uber jobs are jobs that are executed within the MapReduce ApplicationMaster. Rather then communicate with RM to create the mapper and reducer containers. The AM runs the map and reduce tasks within its own process and avoided the overhead of launching and communicate with remote containers.
Why
If you have a small dataset or you want to run MapReduce on small amount of data, Uber configuration will help you out, by reducing additional time that MapReduce normally spends in mapper and reducers phase.
Can I configure an Uber for all MapReduce job?
As of now, map-only jobs and jobs with one reducer are supported.
Upvotes: 49
Reputation: 3752
Uber Job occurs when multiple mapper and reducers are combined to use a single container. There are four core settings around the configuration of Uber Jobs in the mapred-site.xml
. Configuration options for Uber Jobs:
mapreduce.job.ubertask.enable
mapreduce.job.ubertask.maxmaps
mapreduce.job.ubertask.maxreduces
mapreduce.job.ubertask.maxbytes
You can find more details here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.15/bk_using-apache-hadoop/content/uber_jobs.html
Upvotes: 11
Reputation: 2632
In terms of hadoop2.x, Uber jobs are the jobs which are launched in mapreduce ApplicationMaster itself i.e. no separate containers are created for map and reduce jobs and hence the overhead of creating containers and communicating with them is saved.
As far as working (with hadoop 1.x and 2.x) is concerned, I suppose the difference is only observable when it comes to terminologies of 1.x and 2.x, no difference in working.
Configuration params are same as those mentioned by Navneet Kumar in his answer.
PS: Use it only with small dataset.
Upvotes: 4