Reputation: 35
I modified the code below to output words which occurred at least ten times. But it does not work -- the output file does not change at all. What do I have to do to make it work?
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;
// ...
public class WordCount extends Configured implements Tool {
// ...
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
// where I modified, but not working, the output file didnt change
if(sum >= 10)
{
context.write(key, new IntWritable(sum));
}
}
}
public int run(String[] args) throws Exception {
Job job = new Job(getConf());
job.setJarByClass(WordCount.class);
job.setJobName("wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
//job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int ret = ToolRunner.run(new WordCount(), args);
System.exit(ret);
}
}
Upvotes: 1
Views: 290
Reputation: 11721
The code is definitely correct, Maybe you are reading the output generated before you modified the code. Or maybe you did not update the jar file which you previously used after modifying the code?
Upvotes: 0
Reputation: 10652
The code looks valid. To be able to help you we need at least the command line you used to run this. It would also help if you could post the actual output if you feed it a file like this
one
two two
three three three
Etc up till 20
Upvotes: 0
Reputation: 1071
You can see the default Hadoop counters and have an idea of whats happening.
Upvotes: 0
Reputation: 8088
Code looks completely valid. I can suspect that your dataset is big enough, so words happens to appear more then 10 times? Please laso make sure that you indeed looking into new results..
Upvotes: 1