Algo
Algo

Reputation: 91

MapReduce to calculate sum of tab separated input values

I am trying to use MapReduce to find sum of tab separated input separated by its labels. The data looks like this

1     5.0    4.0   6.0
2     2.0    1.0   3.0
1     3.0    4.0   8.0

The first column is the class label so I am expecting an output categorized by class label. For this instance the output would be

label 1: 30.0
label 2: 6.0

Here is the code that I tried but I am getting wrong output and

unexpected class labels are displayed.

public class Total {

 public static class Map extends Mapper<LongWritable, Text, Text, DoubleWritable> {
    private final static DoubleWritable one = new DoubleWritable();
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        word.set(tokenizer.nextToken());
        while (tokenizer.hasMoreTokens()) {
            one.set(Double.valueOf(tokenizer.nextToken()));
            context.write(word, one);                                           
        }
    }
 } 

 public static class Reduce extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {
    private Text Msg = new Text();


    public void reduce(Text key, Iterable<DoubleWritable> values, Context context) 
      throws IOException, InterruptedException {
       firstMsg.set("label " + key+": Total");

       Double sum = 0.0;

         for (DoubleWritable val : values) {

            sum += val.get();


        }

        context.write(Msg, new DoubleWritable(sum));

    }
 }
//void method implementation also exists
}

Upvotes: 2

Views: 2607

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191844

Your objective is to get all same keys into their own reducer, so that you can sum the numbers.

So, take this

1     5.0    4.0   6.0
2     2.0    1.0   3.0
1     3.0    4.0   8.0

And essentially create this

1     [(5 .0    4.0   6.0), (3.0    4.0   8.0)]
2     [(2.0    1.0   3.0)]

So, your map should output just the keys 1 and 2, each with the remaining values after them, not necessarily many values per key.

For this, you can use Mapper<LongWritable, Text, Text, Text>. (Change the output datatype to Text)

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    String line = value.toString();

    StringTokenizer tokenizer = new StringTokenizer(line);
    word.set("label " + tokenizer.nextToken());

    StringBuilder remainder = new StringBuilder();
    while (tokenizer.hasMoreTokens()) {
        remainder.append(tokenizer.nextToken()).append(",");                                        
    }
    String output = remainder.setLength(remainder.getLength() - 1).toString()
    context.write(word, new Text(output));  
}

Then, in the Reducer, make it Reducer<Text, Text, Text, DoubleWritable> (read in (Text,Text) pairs), and you now have a Iterable<Text> values which is an iterable of comma-separated strings, which you can parse as doubles, and take the cumulative sum.

You don't really need the firstMsg.set piece in the reducer - that can be done in the mapper.

Upvotes: 1

Related Questions