Reputation: 91
I am trying to use MapReduce to find sum of tab separated input separated by its labels. The data looks like this
1 5.0 4.0 6.0
2 2.0 1.0 3.0
1 3.0 4.0 8.0
The first column is the class label so I am expecting an output categorized by class label. For this instance the output would be
label 1: 30.0
label 2: 6.0
Here is the code that I tried but I am getting wrong output and
unexpected class labels are displayed.
public class Total {
public static class Map extends Mapper<LongWritable, Text, Text, DoubleWritable> {
private final static DoubleWritable one = new DoubleWritable();
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
word.set(tokenizer.nextToken());
while (tokenizer.hasMoreTokens()) {
one.set(Double.valueOf(tokenizer.nextToken()));
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {
private Text Msg = new Text();
public void reduce(Text key, Iterable<DoubleWritable> values, Context context)
throws IOException, InterruptedException {
firstMsg.set("label " + key+": Total");
Double sum = 0.0;
for (DoubleWritable val : values) {
sum += val.get();
}
context.write(Msg, new DoubleWritable(sum));
}
}
//void method implementation also exists
}
Upvotes: 2
Views: 2607
Reputation: 191844
Your objective is to get all same keys into their own reducer, so that you can sum the numbers.
So, take this
1 5.0 4.0 6.0
2 2.0 1.0 3.0
1 3.0 4.0 8.0
And essentially create this
1 [(5 .0 4.0 6.0), (3.0 4.0 8.0)]
2 [(2.0 1.0 3.0)]
So, your map should output just the keys 1 and 2, each with the remaining values after them, not necessarily many values per key.
For this, you can use Mapper<LongWritable, Text, Text, Text>
. (Change the output datatype to Text
)
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
word.set("label " + tokenizer.nextToken());
StringBuilder remainder = new StringBuilder();
while (tokenizer.hasMoreTokens()) {
remainder.append(tokenizer.nextToken()).append(",");
}
String output = remainder.setLength(remainder.getLength() - 1).toString()
context.write(word, new Text(output));
}
Then, in the Reducer, make it Reducer<Text, Text, Text, DoubleWritable>
(read in (Text,Text)
pairs), and you now have a Iterable<Text> values
which is an iterable of comma-separated strings, which you can parse as doubles, and take the cumulative sum.
You don't really need the firstMsg.set
piece in the reducer - that can be done in the mapper.
Upvotes: 1