Reputation: 97
I am trying to count the occurrence of a particular word in a file using hadoop mapreduce programming in java. Both the file and the word should be an user input. So I am trying to pass the particular word as third argument along with the i/p and o/p paths(In, Out, Word). But i am not able to find out a way to pass the word to the map function. I have tried the following way but it did not work: - created a static String variable in mapper class and assigned the value of my 3rd argument(ie. word to be searched) to it. And then tried to use this static variable inside map function. But inside map function the static variables value came as Null. I am unable to get the third arument's value inside map function.
Is there anyway to set the value via JobConf object? Please help. I have pasted my code below.
public class MyWordCount {
public static class MyWordCountMap extends Mapper < Text, Text, Text, LongWritable > {
static String wordToSearch;
private final static LongWritable ONE = new LongWritable(1L);
private Text word = new Text();
public void map(Text key, Text value, Context context)
throws IOException, InterruptedException {
System.out.println(wordToSearch); // Here the value is coming as Null
if (value.toString().compareTo(wordToSearch) == 0) {
context.write(word, ONE);
}
}
}
public static class SumReduce extends Reducer < Text, LongWritable, Text, LongWritable > {
public void reduce(Text key, Iterator < LongWritable > values,
Context context) throws IOException, InterruptedException {
long sum = 0L;
while (values.hasNext()) {
sum += values.next().get();
}
context.write(key, new LongWritable(sum));
}
}
public static void main(String[] rawArgs) throws Exception {
GenericOptionsParser parser = new GenericOptionsParser(rawArgs);
Configuration conf = parser.getConfiguration();
String[] args = parser.getRemainingArgs();
Job job = new Job(conf, "wordcount");
job.setJarByClass(MyWordCountMap.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setMapperClass(MyWordCountMap.class);
job.setReducerClass(SumReduce.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
String MyWord = args[2];
MyWordCountMap.wordToSearch = MyWord;
job.waitForCompletion(true);
}
}
Upvotes: 2
Views: 7975
Reputation: 903
There is a way to do this with Configuration
(see api here). As an example, the following code can be used which sets "Tree" as the word to be searched:
//Create a new configuration
Configuration conf = new Configuration();
//Set the work to be searched
conf.set("wordToSearch", "Tree");
//create the job
Job job = new Job(conf);
Then, in your mapper/reducer class you can get wordToSearch
(i.e., "Tree" in this example) using the following:
//Create a new configuration
Configuration conf = context.getConfiguration();
//retrieve the wordToSearch variable
String wordToSearch = conf.get("wordToSearch");
See here for more details.
Upvotes: 5