Reputation: 163
Can anyone tell what modification need to do in a simple word count programme to get only the last word count from a file using map reduce.
if input file is
hai hello world hello world java hadoop world hai hello hai java Expected o/p : world 3
As 'world' will be last key after sorting.
Appreciate any help
Upvotes: 1
Views: 289
Reputation: 688
One simple way available.
Which Doesn't Need Explicit Sorting.
Assuming you have one reducer
running. You can Override a cleanup()
method in reducer class.
A cleanup() method is used in reducer to do house keeping activities at the end of the reduce task.
But you can make use of it. As cleanup() method is going to be executed only once after the reduce task.
By the end of your reduce task you will be holding only last key-value pair. Now, instead of emiting that output from reduce() method emit it from cleanup() method.
You can keep your context.write() only inside the cleanup().
@Override
protected void cleanup(Context context){
context.write(//keep your key-values here);
}
I believe this does your job effortlessly and you will get the desired result instantly by using the above 3 lines of code.
Upvotes: 2
Reputation: 2221
Set number of reducers to 1. And in map side override the default sort method to sort in descending order and set the comparartor class in your driver code job.setSortComparatorClass.
And get only the first Key,value from your reduce call.
public class MysortComparator extends WritableComparator
{
protected MysortComparator()
{
super(Text.class,true);
}
@SuppressWarnings("rawtypes")
public int compare(WritableComparable w,WritableComparable w1)
{
Text s=(Text)w;
Text s1=(Text)w1;
return -1 * s.compareTo(s1);
}
Also u can overwrite reducer's run method to read only the first record and pass it to the reduce call and ignore other records. This will avoid the overhead if your single reducer is going to take large key/value pairs.
public void run(Context context) throws IOException, InterruptedException {
setup(context);
int rec_cnt = 0;
while (context.nextKey() && rec_cnt++ < 1) {
reduce(context.getCurrentKey(), context.getValues(), context);
}
cleanup(context);
}
Upvotes: 1