Reputation: 193
I have this code in which i have set one mapper and one reducer.I want to include one more mapper and a reducer for doing further jobs. The problem is that i have to take the output file of the first map reduce job as the input to the next map reduce job.Is it possible to do that?if yes then how can i do it?
public int run(String[] args) throws Exception
{
JobConf conf = new JobConf(getConf(),DecisionTreec45.class);
conf.setJobName("c4.5");
// the keys are words (strings)
conf.setOutputKeyClass(Text.class);
// the values are counts (ints)
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(MyMapper.class);
conf.setReducerClass(MyReducer.class);
//set your input file path below
FileInputFormat.setInputPaths(conf, "/home/hduser/Id3_hds/playtennis.txt");
FileOutputFormat.setOutputPath(conf, new Path("/home/hduser/Id3_hds/1/output"+current_index));
JobClient.runJob(conf);
return 0;
}
Upvotes: 1
Views: 9679
Reputation: 934
Look at how this works.
You need to have two jobs. Job2 is dependent on job1.
public class ChainJobs extends Configured implements Tool {
private static final String OUTPUT_PATH = "intermediate_output";
@Override
public int run(String[] args) throws Exception {
/*
* Job 1
*/
Configuration conf = getConf();
FileSystem fs = FileSystem.get(conf);
Job job = new Job(conf, "Job1");
job.setJarByClass(ChainJobs.class);
job.setMapperClass(MyMapper1.class);
job.setReducerClass(MyReducer1.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
TextInputFormat.addInputPath(job, new Path(args[0]));
TextOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));
job.waitForCompletion(true); /*this goes to next command after this job is completed. your second job is dependent on your first job.*/
/*
* Job 2
*/
Configuration conf2 = getConf();
Job job2 = new Job(conf2, "Job 2");
job2.setJarByClass(ChainJobs.class);
job2.setMapperClass(MyMapper2.class);
job2.setReducerClass(MyReducer2.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(Text.class);
job2.setInputFormatClass(TextInputFormat.class);
job2.setOutputFormatClass(TextOutputFormat.class);
TextInputFormat.addInputPath(job2, new Path(OUTPUT_PATH));
TextOutputFormat.setOutputPath(job2, new Path(args[1]));
return job2.waitForCompletion(true) ? 0 : 1;
}
/**
* Method Name: main Return type: none Purpose:Read the arguments from
* command line and run the Job till completion
*
*/
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
if (args.length != 2) {
System.err.println("Enter valid number of arguments <Inputdirectory> <Outputlocation>");
System.exit(0);
}
ToolRunner.run(new Configuration(), new ChainJobs(), args);
}
}
Upvotes: 2
Reputation: 934
yes its possible to do that. you can check the following tutorial to see how chaining occurs. http://gandhigeet.blogspot.com/2012/12/as-discussed-in-previous-post-hadoop.html
Make sure you delete the intermediate output data in HDFS which will be created by each MR phase by using fs.delete(intermediateoutputPath);
Upvotes: 2