DevHelp
DevHelp

Reputation: 305

Get Top N items from mapper output - Mapreduce

My Mapper task returns me following output:

2   c
2   g
3   a
3   b
6   r

I have written reducer code and keycomparator that produces the correct output but how do I get Top 3 out (top N by count) of Mapper Output:

public static class WLReducer2 extends
        Reducer<IntWritable, Text, Text, IntWritable> {

    @Override
    protected void reduce(IntWritable key, Iterable<Text> values,
            Context context) throws IOException, InterruptedException {

        for (Text x : values) {
            context.write(new Text(x), key);
        }

    };

}

public static class KeyComparator extends WritableComparator {
    protected KeyComparator() {
        super(IntWritable.class, true);
    }

    @Override
    public int compare(WritableComparable w1, WritableComparable w2) {
        // TODO Auto-generated method stub

        // Logger.error("--------------------------> writing Keycompare data = ----------->");
        IntWritable ip1 = (IntWritable) w1;
        IntWritable ip2 = (IntWritable) w2;
        int cmp = -1 * ip1.compareTo(ip2);

        return cmp;
    }
}

This is the reducer output:

r   6
b   3
a   3
g   2
c   2

The expected output from reducer is top 3 by count which is:

r   6
b   3
a   3

Upvotes: 0

Views: 3733

Answers (2)

RojoSam
RojoSam

Reputation: 1496

If your Top-N elements could be stored in memory, you could use a TreeMap to store the Top-N elements and if your process could be aggregated using only one reducer.

  1. Instantiate a instance variable TreeMap in the setup() method of your reducer.
  2. Inside your reducer() method you should aggregate all the values for the keygroup and then compare the result with the first (lowest) key in the Tree, map.firstKey(). If your current value is bigger than the lowest value in the Tree then insert the current value into the treemap, map.put(value, Item) and then delete the lowest value from the Tree map.remove(value).
  3. In the reducer's cleanup() method, write to the output all the TreeMap's elements in the required order.

Note: The value to compare your records must be the key in your TreeMap. And the value of your TreeMap should be the description, tag, letter, etc; related with the number.

Upvotes: 1

Vignesh I
Vignesh I

Reputation: 2221

Restrict your output from reducer. Something like this.

public static class WLReducer2 extends
        Reducer<IntWritable, Text, Text, IntWritable> {
    int count=0;
    @Override
    protected void reduce(IntWritable key, Iterable<Text> values,
            Context context) throws IOException, InterruptedException {

        for (Text x : values) {
            if (count > 3)
            context.write(new Text(x), key);
            count++;
        }

    };
}

Set number of reducers to 1. job.setNumReduceTasks(1).

Upvotes: 3

Related Questions