Reputation: 6485
I've switched to the new Hadoop api (mapreduce) from the old one (mapred). I can't set the number of mappers in the new api. I can use job.setNumReduceTasks()
to set the number of reducers but there isn't any method for number of mappers. I also tried conf.setInt("mapred.map.tasks", numMapper)
and conf.setInt("mapreduce.map.tasks", numMapper)
but it is not working either.
Upvotes: 0
Views: 4371
Reputation: 1690
Starting from Hadoop 2.7, you can use mapreduce.job.running.map.limit
and mapreduce.job.running.reduce.limit
to control these at each job level.
Fixed by this JIRA ticket.
Upvotes: 1
Reputation: 442
In YARN, one can set mapreduce.input.fileinputformat.split.minsize (its in megabytes) much higher than the blocksize of the files being read. This will force more data through each mapper thereby reducing the number of mappers required. However, some file formats may have their own minimum split size which takes priority over this setting.
Upvotes: 0
Reputation: 77505
The number of mapper tasks is determined by the input split you have. Obviously, each part will be processed by 1 mapper. So essentially, your data determines the number of your mappers!
You can however use mapreduce.jobtracker.maxtasks.perjob
to limit the parallelism (unfortunately, this affects both mappers and reducers!). But if you set this to 10, at most 10 mappers should run in parallel.
A more fine-grained control would be nice, but is an open ticket:
MAPREDUCE-5583: Ability to limit running map and reduce tasks
Upvotes: 2