Reputation: 190
I have some dataset and I want to calculate the minimum, maximum and average for each record (for example: userID_1 -- minimum_1-- maximum_1 -- avg).
this my code, I need to know what to do that can let me write those values for that single key:
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
int visitsCounter = 0;
int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
float avg;
for (IntWritable val : values) {
int currentValue = val.get();
sum += currentValue;
visitsCounter++;
min = Math.min(min, currentValue);
max = Math.max(max, currentValue);
}
avg = sum / visitsCounter;
//here can be the supposed edit to let me output (user - min - max - avg )
context.write(key, new IntWritable(sum));
}
}
Upvotes: 2
Views: 5047
Reputation: 1253
In MapReduce, data flows in terms of Key-Value pairs at both phases i.e., Map Phase and Reduce Phase.
So we need to design our Key-Value pairs at Map Level and Reduce Level.
Here key and value data types are Writables.
Key can be composed of multiple values and value can be composed of multiple values.
For atomic values case, we use IntWritable, DoubleWritable, LongWritable, FloatWritable etc...
For complex key and value data cases, we use Text data type or user defined data types.
Simple solution to handle this scenario is use Text data type i.e., concatenation of all these columns into a String object and serialize this String object into Text object. But this is inefficient due to a lot of String concatenations on large data sets.
Use custom/user defined data type to handle this kind of scenario. Write the Custom data type using Writable or WritableComparable interface from Hadoop API.
public static class Reduce extends Reducer<Text, IntWritable, Text, Text> {
Text emitValue = new Text()
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
int visitsCounter = 0;
int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
float avg;
for (IntWritable val : values) {
int currentValue = val.get();
sum += currentValue;
visitsCounter++;
min = Math.min(min, currentValue);
max = Math.max(max, currentValue);
}
avg = sum / visitsCounter;
String myValue = min + "\t" + max + "\t" + avg;
emitValue.set(myValue);
//here can be the supposed edit to let me output (user - min - max - avg )
context.write(key, emitValue);
}
}
Upvotes: 2