MapReduce related - what am I doing wrong here?

Question

I am new to Map-Reduce programming paradigm. So, my question may sound utter stupid to many. However, I request all to kindly bear with me.

I am trying to count number of occurrences of a particular word in a file. Now, I wrote following Java classes for that.

The input file for this has following entries:

The tiger entered village in the night the the \
Then ... the story continues...
I have put the word 'the' many times because of my own program purpose.

WordCountMapper.java

package com.demo.map_reduce.word_count.mapper;

import java.io.IOException;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper
{
@SuppressWarnings({ "rawtypes", "unchecked" })
@Override
protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
       if(null != value) {
          final String line = value.toString();
          if(StringUtils.containsIgnoreCase(line, "the")) {
             context.write(new Text("the"), new IntWritable(StringUtils.countMatches(line, "the")));
          }
       }
    }
}

WordCountReducer.java

package com.demo.map_reduce.word_count.reducer;

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends Reducer
{
   @SuppressWarnings({ "rawtypes", "unchecked" })
   public void reduce(Text key, Iterable values, org.apache.hadoop.mapreduce.Reducer.Context context)
        throws IOException, InterruptedException {
          int count = 0;
      for (final IntWritable nextValue : values) {
             count += nextValue.get();
          }
          context.write(key, new IntWritable(count));
    }
}

WordCounter.java

package com.demo.map_reduce.word_count;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import com.demo.map_reduce.word_count.mapper.WordCountMapper;
import com.demo.map_reduce.word_count.reducer.WordCountReducer;

public class WordCounter
{
    public static void main(String[] args) {
        final String inputDataPath = "/input/my_wordcount_1/input_data_file.txt";
        final String outputDataDir = "/output/my_wordcount_1";
        try {
            final Job job = Job.getInstance();
            job.setJobName("Simple word count");
            job.setJarByClass(WordCounter.class);

            job.setMapperClass(WordCountMapper.class);
            job.setReducerClass(WordCountReducer.class);

            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(IntWritable.class);

            FileInputFormat.addInputPath(job, new Path(inputDataPath));
            FileOutputFormat.setOutputPath(job, new Path(outputDataDir));

            job.waitForCompletion(true);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

I am getting following output when I run this program in Hadoop.

the 2
the 1
the 3

I want the reducer to result

the 4

I am sure I am doing something wrong; or I may not have understood properly. Could someone help me here?

Thanks in advance.

-Niranjan

banjara · Accepted Answer

problem is your reduce method is not getting invoked
To make it work just change the signature of reduce function to

public void reduce(Text key, Iterable values, Context context)
        throws IOException, InterruptedException {

MapReduce related - what am I doing wrong here?

Answers (2)

Related Questions