Karos
Karos

Reputation: 51

How does Hadoop decides which mapper to run in MapTask class, OldMapper or NewMapper?

I cannot understand the difference between runOldMapper(...) and runNewMapper(...) methods in MapTask class. Hadoop decides based on "useNewApi" parameter from JobConf; but where and when in the framework this parameter has been set? I think the default value is FALSE for all jobs. We can set the value to TRUE by calling JobConf.setUseNewMapper(boolean flag) which sets "mapred.mapper.new-api", but when and why we should decide to set this parameter?

Upvotes: 4

Views: 1664

Answers (1)

Chris White
Chris White

Reputation: 30089

You're correct in the assumption that this behaviour is triggered by the mapred.mapper.new-api configuration.

Depending on whether your using the new or old job conf/client, look in the source for:

  • org.apache.hadoop.mapreduce.Job.submit() method, which calls the setUseNewAPI() private method. This configures the new-api properties depending on whether the old mapper / reducer class properties are set or not
  • org.apache.hadoop.mapred.JobConf - As you note in your question, you the developer will need to call the setUseNewMapper(true) method if you are using a new API mapper implementation (false by default and your mapper class implements the mapred.Mapper interface, or true if your mapper extends the mapreduce.Mapper class)

Upvotes: 3

Related Questions