Codebeginner
Codebeginner

Reputation: 193

How can i have multiple mappers and reducers?

I have this code in which i have set one mapper and one reducer.I want to include one more mapper and a reducer for doing further jobs. The problem is that i have to take the output file of the first map reduce job as the input to the next map reduce job.Is it possible to do that?if yes then how can i do it?

public int run(String[] args) throws Exception 
          {
            JobConf conf = new JobConf(getConf(),DecisionTreec45.class);
            conf.setJobName("c4.5");

            // the keys are words (strings)
            conf.setOutputKeyClass(Text.class);
            // the values are counts (ints)
            conf.setOutputValueClass(IntWritable.class);
            conf.setMapperClass(MyMapper.class);
            conf.setReducerClass(MyReducer.class);


            //set your input file path below
            FileInputFormat.setInputPaths(conf, "/home/hduser/Id3_hds/playtennis.txt");
            FileOutputFormat.setOutputPath(conf, new Path("/home/hduser/Id3_hds/1/output"+current_index));
            JobClient.runJob(conf);
            return 0;
          }

Upvotes: 1

Views: 9679

Answers (2)

alekya reddy
alekya reddy

Reputation: 934

Look at how this works.

You need to have two jobs. Job2 is dependent on job1.

public class ChainJobs extends Configured implements Tool {

 private static final String OUTPUT_PATH = "intermediate_output";

 @Override
 public int run(String[] args) throws Exception {
  /*
   * Job 1
   */
  Configuration conf = getConf();
  FileSystem fs = FileSystem.get(conf);
  Job job = new Job(conf, "Job1");
  job.setJarByClass(ChainJobs.class);

  job.setMapperClass(MyMapper1.class);
  job.setReducerClass(MyReducer1.class);

  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);

  job.setInputFormatClass(TextInputFormat.class);
  job.setOutputFormatClass(TextOutputFormat.class);

  TextInputFormat.addInputPath(job, new Path(args[0]));
  TextOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));

  job.waitForCompletion(true); /*this goes to next command after this job is completed. your second job is dependent on your first job.*/


  /*
   * Job 2
   */
  Configuration conf2 = getConf();
  Job job2 = new Job(conf2, "Job 2");
  job2.setJarByClass(ChainJobs.class);

  job2.setMapperClass(MyMapper2.class);
  job2.setReducerClass(MyReducer2.class);

  job2.setOutputKeyClass(Text.class);
  job2.setOutputValueClass(Text.class);

  job2.setInputFormatClass(TextInputFormat.class);
  job2.setOutputFormatClass(TextOutputFormat.class);

  TextInputFormat.addInputPath(job2, new Path(OUTPUT_PATH));
  TextOutputFormat.setOutputPath(job2, new Path(args[1]));

  return job2.waitForCompletion(true) ? 0 : 1;
 }

 /**
  * Method Name: main Return type: none Purpose:Read the arguments from
  * command line and run the Job till completion
  * 
  */
 public static void main(String[] args) throws Exception {
  // TODO Auto-generated method stub
  if (args.length != 2) {
   System.err.println("Enter valid number of arguments <Inputdirectory>  <Outputlocation>");
   System.exit(0);
  }
  ToolRunner.run(new Configuration(), new ChainJobs(), args);
 }
}

Upvotes: 2

alekya reddy
alekya reddy

Reputation: 934

yes its possible to do that. you can check the following tutorial to see how chaining occurs. http://gandhigeet.blogspot.com/2012/12/as-discussed-in-previous-post-hadoop.html

Make sure you delete the intermediate output data in HDFS which will be created by each MR phase by using fs.delete(intermediateoutputPath);

Upvotes: 2

Related Questions