Reputation: 619
I'm a novice on hadoop, I'm getting familiar to the style of map-reduce programing but now I faced a problem : Sometimes I need only map for a job and I only need the map result directly as output, which means reduce phase is not needed here, how can I achive that?
Upvotes: 43
Views: 33896
Reputation: 787
If you are using oozie as a scheduler to manager your hadoop jobs, then you can just set the property mapred.reduce.tasks(which is the default number of reduce tasks per job) to 0. You can add your mapper in the property mapreduce.map.class, and also there will be no need to add the property mapreduce.reduce.class since reducers are not required.
<configuration>
<property>
<name>mapreduce.map.class</name>
<value>my.com.package.AbcMapper</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>0</value>
</property>
.
.
.
<configuration>
Upvotes: 0
Reputation: 8937
Can be quite helpful when you need to launch job with mappers only from terminal. You can turn off reducers by specifing 0 reducers in hadoop jar command implicitly:
-D mapred.reduce.tasks=0
So the result command will be following:
hadoop jar myJob.jar -D mapred.reduce.tasks=0 -input myInputDirs -output myOutputDir
To be backward compatible, Hadoop also supports the "-reduce NONE" option, which is equivalent to "-D mapred.reduce.tasks=0".
Upvotes: 5
Reputation: 4579
You can also use the IdentityReducer:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/IdentityReducer.html
Upvotes: 9