Breakinen
Breakinen

Reputation: 619

How to write 'map only' hadoop jobs?

I'm a novice on hadoop, I'm getting familiar to the style of map-reduce programing but now I faced a problem : Sometimes I need only map for a job and I only need the map result directly as output, which means reduce phase is not needed here, how can I achive that?

Upvotes: 43

Views: 33896

Answers (4)

Neha Kumari
Neha Kumari

Reputation: 787

If you are using oozie as a scheduler to manager your hadoop jobs, then you can just set the property mapred.reduce.tasks(which is the default number of reduce tasks per job) to 0. You can add your mapper in the property mapreduce.map.class, and also there will be no need to add the property mapreduce.reduce.class since reducers are not required.

<configuration>
   <property>
     <name>mapreduce.map.class</name>
     <value>my.com.package.AbcMapper</value>
   </property>
   <property>
     <name>mapred.reduce.tasks</name>
     <value>0</value>
   </property>
   .
   .
   .
<configuration>

Upvotes: 0

Alex
Alex

Reputation: 8937

Can be quite helpful when you need to launch job with mappers only from terminal. You can turn off reducers by specifing 0 reducers in hadoop jar command implicitly:

-D mapred.reduce.tasks=0 

So the result command will be following:

hadoop jar myJob.jar -D mapred.reduce.tasks=0 -input myInputDirs -output myOutputDir

To be backward compatible, Hadoop also supports the "-reduce NONE" option, which is equivalent to "-D mapred.reduce.tasks=0".

Upvotes: 5

Thomas Jungblut
Thomas Jungblut

Reputation: 20969

This turns off the reducer.

job.setNumReduceTasks(0);

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)

Upvotes: 59

Related Questions