How to do a general reduce for all key/value pairs in hadoop

Question

I am new to hadoop and I try to execute some map/reduce tasks in Java . I was wondering how can we execute a reduce operation for all the key/value pairs .

For example imagine we have for each day of a month the highest temperature in this current day . We take the day as a key and the temperature as a value and I wish to get the key/value for the highest temperature for all the month .

I hope my question is clear !

Thank you for your help.

harpun · Accepted Answer

Yes, it's possible. Just configure your job to use a single reducer via job.setNumReduceTasks(1). This single reducer will iterate over all key/value pairs. In the reduce() method you just search for the maximum and in the cleanup() method you output the final result. Example with (k, v) = (year, temperature):

public class MaxTemperatureReducer extends Reducer {
    private static int year = 0;
    private static double maxTemp = 0.0;

    @Override
    public void reduce(IntWritable key, Iterable values, Context context) throws IOException, InterruptedException {
        for (DoubleWritable value : values) {
            if (value.get() > maxTemp) {
                year = key.get();
                maxTemp = value.get();
            }
        }
    }

    @Override
    public void cleanup(Context context) throws IOException, InterruptedException {
        context.write(new IntWritable(year), new DoubleWritable(maxTemp));
    }
}

How to do a general reduce for all key/value pairs in hadoop

Answers (2)

Related Questions