Reputation: 57
How do I programmatically add tasks to hadoop and run in my Java application? Any ideas? Thanks.
Upvotes: 2
Views: 1168
Reputation: 20969
In java this is quite easy:
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(YOUR_MAPPER.class);
job.setMapperClass(YOUR_MAPPER.class);
job.setReducerClass(YOUR_REDUCER.class);
job.setOutputKeyClass(YOUR_OUTPUT_KEY.class);
job.setOutputValueClass(YOUR_OUTPUT_VALUE.class);
FileInputFormat.addInputPath(job, new Path("YOUR_INPUT_PATH"));
FileOutputFormat.setOutputPath(job, new Path("YOUR_OUTPUT_PATH"));
System.exit(job.waitForCompletion(true) ? 0 : 1);
If you need to submit it to a cluster you have to add these values to the configuration object:
conf.set("fs.default.name", "hdfs://localhost:9000");
conf.set("mapred.job.tracker", "localhost:9001");
You should replace the ports and hostname to the configured values in the cluster conf's directory.
Upvotes: 4