Reputation: 51
I cannot understand the difference between runOldMapper(...)
and runNewMapper(...)
methods in MapTask
class. Hadoop decides based on "useNewApi"
parameter from JobConf
; but where and when in the framework this parameter has been set? I think the default value is FALSE for all jobs. We can set the value to TRUE by calling JobConf.setUseNewMapper(boolean flag)
which sets "mapred.mapper.new-api"
, but when and why we should decide to set this parameter?
Upvotes: 4
Views: 1664
Reputation: 30089
You're correct in the assumption that this behaviour is triggered by the mapred.mapper.new-api
configuration.
Depending on whether your using the new or old job conf/client, look in the source for:
org.apache.hadoop.mapreduce.Job.submit()
method, which calls the setUseNewAPI()
private method. This configures the new-api
properties depending on whether the old mapper / reducer class properties are set or notorg.apache.hadoop.mapred.JobConf
- As you note in your question, you the developer will need to call the setUseNewMapper(true)
method if you are using a new API mapper implementation (false by default and your mapper class implements the mapred.Mapper interface, or true if your mapper extends the mapreduce.Mapper class)Upvotes: 3