MapReduce on Yarn:Control the mapper or reducer tasks running simultaneously?

Question

My mapreduce-based hive sql is running on Yarn and the hadoop version is 2.7.2 . What I want ,it to restrict the mapper tasks or reducer tasks running simultaneously when some hive sql is really big. I have tried following parameters ,but in fact they are not what I want:

mapreduce.tasktracker.reduce.tasks.maximum: The maximum number of reduce tasks that will be run simultaneously by a task tracker.

mapreduce.tasktracker.map.tasks.maximum: The maximum number of map tasks that will be run simultaneously by a task tracker.

the above two parameters seems unavailable for my yarn cluster, because yarn has no concept of JobTracker,which is the concept of hadoop 1.x? And I have checked my applicatiion whose running mappers is above 20, but the mapreduce.tasktracker.reduce.tasks.maximum value is just the default value 2.

and then , I tried the following two parameters , also, they are not what I need:

mapreduce.job.maps: The default number of map tasks per job. Ignored when mapreduce.jobtracker.address is "local".

mapreduce.job.reduces: The default number of reduce tasks per job. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapreduce.jobtracker.address is "local".

mapreduce.job.maps is just a hint for how many splits will be created for mapping tasks , and mapreduce.job.maps define how many reducer will be generated.

But what I want to limit ,is how many mapper or reducer tasks was allowed to run simultaneously for each application?

In my below screenshot, a yarn application has at least 20+ mapper tasks running ,which cost too much cluster resource.I want to limit it to 10 at most.

So, what can I do?

MapReduce on Yarn:Control the mapper or reducer tasks running simultaneously?

Answers (1)

Related Questions