Reputation: 619

How to write 'map only' hadoop jobs?

I'm a novice on hadoop, I'm getting familiar to the style of map-reduce programing but now I faced a problem : Sometimes I need only map for a job and I only need the map result directly as output, which means reduce phase is not needed here, how can I achive that?

Upvotes: 43

Answers (4)

Neha Kumari

Reputation: 787

If you are using oozie as a scheduler to manager your hadoop jobs, then you can just set the property mapred.reduce.tasks(which is the default number of reduce tasks per job) to 0. You can add your mapper in the property mapreduce.map.class, and also there will be no need to add the property mapreduce.reduce.class since reducers are not required.

<configuration>
   <property>
     <name>mapreduce.map.class</name>
     <value>my.com.package.AbcMapper</value>
   </property>
   <property>
     <name>mapred.reduce.tasks</name>
     <value>0</value>
   </property>
   .
   .
   .
<configuration>

Upvotes: 0

Alex

Reputation: 8937

Can be quite helpful when you need to launch job with mappers only from terminal. You can turn off reducers by specifing 0 reducers in hadoop jar command implicitly:

-D mapred.reduce.tasks=0

So the result command will be following:

hadoop jar myJob.jar -D mapred.reduce.tasks=0 -input myInputDirs -output myOutputDir

To be backward compatible, Hadoop also supports the "-reduce NONE" option, which is equivalent to "-D mapred.reduce.tasks=0".

Upvotes: 5

Thomas Jungblut

Reputation: 20969

This turns off the reducer.

job.setNumReduceTasks(0);

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)

Upvotes: 59

Peter Wippermann

Reputation: 4579

You can also use the IdentityReducer:

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/IdentityReducer.html

Upvotes: 9

How to write &#39;map only&#39; hadoop jobs?

Answers (4)

Related Questions

How to write 'map only' hadoop jobs?