Reputation: 2921
I am new to Map-Reduce programming paradigm. So, my question may sound utter stupid to many. However, I request all to kindly bear with me.
I am trying to count number of occurrences of a particular word in a file. Now, I wrote following Java classes for that.
The input file for this has following entries:
The tiger entered village in the night the the \
Then ... the story continues...
I have put the word 'the' many times because of my own program purpose.
WordCountMapper.java
package com.demo.map_reduce.word_count.mapper;
import java.io.IOException;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
@SuppressWarnings({ "rawtypes", "unchecked" })
@Override
protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
if(null != value) {
final String line = value.toString();
if(StringUtils.containsIgnoreCase(line, "the")) {
context.write(new Text("the"), new IntWritable(StringUtils.countMatches(line, "the")));
}
}
}
}
WordCountReducer.java
package com.demo.map_reduce.word_count.reducer;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>
{
@SuppressWarnings({ "rawtypes", "unchecked" })
public void reduce(Text key, Iterable<IntWritable> values, org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException, InterruptedException {
int count = 0;
for (final IntWritable nextValue : values) {
count += nextValue.get();
}
context.write(key, new IntWritable(count));
}
}
WordCounter.java
package com.demo.map_reduce.word_count;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import com.demo.map_reduce.word_count.mapper.WordCountMapper;
import com.demo.map_reduce.word_count.reducer.WordCountReducer;
public class WordCounter
{
public static void main(String[] args) {
final String inputDataPath = "/input/my_wordcount_1/input_data_file.txt";
final String outputDataDir = "/output/my_wordcount_1";
try {
final Job job = Job.getInstance();
job.setJobName("Simple word count");
job.setJarByClass(WordCounter.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(inputDataPath));
FileOutputFormat.setOutputPath(job, new Path(outputDataDir));
job.waitForCompletion(true);
}
} catch (Exception e) {
e.printStackTrace();
}
}
I am getting following output when I run this program in Hadoop.
the 2
the 1
the 3
I want the reducer to result
the 4
I am sure I am doing something wrong; or I may not have understood properly. Could someone help me here?
Thanks in advance.
-Niranjan
Upvotes: 0
Views: 478
Reputation: 21883
The problem is you are not normalizing the keywords and you are not counting the words, You are counting the lines that contains the word the
.
Change your map logic to the following
protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
if(null != value) {
final String line = value.toString();
for(String word:line.split("\\s+")){
context.write(new Text(word.trim().toLowerCase()), new IntWritable(1));
}
}
}
And reduce logic to following
public void reduce(Text key, Iterable<IntWritable> values, org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException, InterruptedException {
int count = 0;
if(key.toString().trim().toLowerCase().equals("the")) {
for (final IntWritable nextValue : values) {
count += nextValue.get();
}
context.write(key, new IntWritable(count));
}
}
Upvotes: 0
Reputation: 3890
problem is your reduce method is not getting invoked
To make it work just change the signature of reduce function to
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
Upvotes: 1