HHH
HHH

Reputation: 6485

How to set the number of mappers in new Hadoop api?

I've switched to the new Hadoop api (mapreduce) from the old one (mapred). I can't set the number of mappers in the new api. I can use job.setNumReduceTasks() to set the number of reducers but there isn't any method for number of mappers. I also tried conf.setInt("mapred.map.tasks", numMapper) and conf.setInt("mapreduce.map.tasks", numMapper) but it is not working either.

Upvotes: 0

Views: 4371

Answers (3)

Joel
Joel

Reputation: 1690

Starting from Hadoop 2.7, you can use mapreduce.job.running.map.limit and mapreduce.job.running.reduce.limit to control these at each job level.

Fixed by this JIRA ticket.

Upvotes: 1

Aaron
Aaron

Reputation: 442

In YARN, one can set mapreduce.input.fileinputformat.split.minsize (its in megabytes) much higher than the blocksize of the files being read. This will force more data through each mapper thereby reducing the number of mappers required. However, some file formats may have their own minimum split size which takes priority over this setting.

Upvotes: 0

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77505

The number of mapper tasks is determined by the input split you have. Obviously, each part will be processed by 1 mapper. So essentially, your data determines the number of your mappers!

You can however use mapreduce.jobtracker.maxtasks.perjob to limit the parallelism (unfortunately, this affects both mappers and reducers!). But if you set this to 10, at most 10 mappers should run in parallel.

A more fine-grained control would be nice, but is an open ticket:

MAPREDUCE-5583: Ability to limit running map and reduce tasks

Upvotes: 2

Related Questions