Reputation: 3069
My mapreduce-based hive sql is running on Yarn and the hadoop version is 2.7.2 . What I want ,it to restrict the mapper tasks or reducer tasks running simultaneously when some hive sql is really big. I have tried following parameters ,but in fact they are not what I want:
mapreduce.tasktracker.reduce.tasks.maximum: The maximum number of reduce tasks that will be run simultaneously by a task tracker.
mapreduce.tasktracker.map.tasks.maximum: The maximum number of map tasks that will be run simultaneously by a task tracker.
the above two parameters seems unavailable for my yarn cluster, because yarn has no concept of JobTracker,which is the concept of hadoop 1.x? And I have checked my applicatiion whose running mappers is above 20, but the mapreduce.tasktracker.reduce.tasks.maximum value is just the default value 2
.
and then , I tried the following two parameters , also, they are not what I need:
mapreduce.job.maps: The default number of map tasks per job. Ignored when mapreduce.jobtracker.address is "local".
mapreduce.job.reduces: The default number of reduce tasks per job. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapreduce.jobtracker.address is "local".
mapreduce.job.maps
is just a hint for how many splits will be created for mapping tasks , and mapreduce.job.maps
define how many reducer will be generated.
But what I want to limit ,is how many mapper or reducer tasks was allowed to run simultaneously for each application?
In my below screenshot, a yarn application has at least 20+ mapper tasks running ,which cost too much cluster resource.I want to limit it to 10 at most.
So, what can I do?
Upvotes: 0
Views: 1348
Reputation: 71
There may be several questions here. First of all to control the reducers for a particular job running at the same time of the mappers or before all of the mappers have completed you need to tweak: mapreduce.job.reduce.slowstart.completedmaps.
This parameter defaults to .8 which is 80%. This means when 80% of the mappers complete the reducers to start. If you want the reducers to wait until all of the mappers are complete then you need to set this to 1.
As for controlling the number of the mappers running at one time then you need to look at setting up either the fair scheduler or capacity scheduler.
Using one of the schedulers you can set minimums and maximums of resources for a queue where a job runs which will control how many containers (Mappers and Reducers are containers in Yarn) run at one time.
There is good information out there about both schedulers. https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
Upvotes: 1