Priya v v
Priya v v

Reputation: 163

How to get last word count in a input file using mapreduce programme

Can anyone tell what modification need to do in a simple word count programme to get only the last word count from a file using map reduce.

if input file is

hai hello world
hello world java
hadoop world hai
hello hai java

Expected o/p : world 3

As 'world' will be last key after sorting.

Appreciate any help

Upvotes: 1

Views: 289

Answers (2)

Jagadish Talluri
Jagadish Talluri

Reputation: 688

One simple way available. Which Doesn't Need Explicit Sorting.

Assuming you have one reducer running. You can Override a cleanup() method in reducer class.

A cleanup() method is used in reducer to do house keeping activities at the end of the reduce task.

But you can make use of it. As cleanup() method is going to be executed only once after the reduce task.

By the end of your reduce task you will be holding only last key-value pair. Now, instead of emiting that output from reduce() method emit it from cleanup() method.

You can keep your context.write() only inside the cleanup().

@Override
protected void cleanup(Context context){

    context.write(//keep your key-values here);
}

I believe this does your job effortlessly and you will get the desired result instantly by using the above 3 lines of code.

Upvotes: 2

Vignesh I
Vignesh I

Reputation: 2221

Set number of reducers to 1. And in map side override the default sort method to sort in descending order and set the comparartor class in your driver code job.setSortComparatorClass. And get only the first Key,value from your reduce call.

public class MysortComparator extends WritableComparator
{
    protected MysortComparator()
    {
        super(Text.class,true);
    }
    @SuppressWarnings("rawtypes")
    public int compare(WritableComparable w,WritableComparable w1)
    {
        Text s=(Text)w;
        Text s1=(Text)w1;
        return -1 * s.compareTo(s1);
}

Also u can overwrite reducer's run method to read only the first record and pass it to the reduce call and ignore other records. This will avoid the overhead if your single reducer is going to take large key/value pairs.

public void run(Context context) throws IOException, InterruptedException {
  setup(context);
  int rec_cnt = 0;
  while (context.nextKey() && rec_cnt++ < 1) {
    reduce(context.getCurrentKey(), context.getValues(), context);
  }
  cleanup(context);
}

Upvotes: 1

Related Questions