Reputation: 109
I am new to hadoop and I try to execute some map/reduce tasks in Java . I was wondering how can we execute a reduce operation for all the key/value pairs .
For example imagine we have for each day of a month the highest temperature in this current day . We take the day as a key and the temperature as a value and I wish to get the key/value for the highest temperature for all the month .
I hope my question is clear !
Thank you for your help.
Upvotes: 2
Views: 991
Reputation: 4110
Yes, it's possible. Just configure your job to use a single reducer via job.setNumReduceTasks(1). This single reducer will iterate over all key/value pairs. In the reduce()
method you just search for the maximum and in the cleanup()
method you output the final result. Example with (k, v) = (year, temperature)
:
public class MaxTemperatureReducer extends Reducer<IntWritable, DoubleWritable, IntWritable, DoubleWritable> {
private static int year = 0;
private static double maxTemp = 0.0;
@Override
public void reduce(IntWritable key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException {
for (DoubleWritable value : values) {
if (value.get() > maxTemp) {
year = key.get();
maxTemp = value.get();
}
}
}
@Override
public void cleanup(Context context) throws IOException, InterruptedException {
context.write(new IntWritable(year), new DoubleWritable(maxTemp));
}
}
Upvotes: 1
Reputation: 1177
The simple approach would be to simply use an arbitrary key ("month") and have both temperature and day in the value - then in your reduce method, find the highest value for temperature and return both day and temperature.
Upvotes: 0