user3019299
user3019299

Reputation: 199

Cascading Sum operation

I am very new about Cascading. Now I know how to do the word count using Cascading. Next I want to do some Sum operation. For example I have the following input:

a b c 1000
c d e 2000
a s e 5000

I want to SUM the last field. If I simply select that field and do COUNT it will give me the output like:

1000 1
2000 1
5000 1

It is not what I want!! I want to SUM all these 3 numbers and give it a name which is called "duration" which looks like:

duration 8000

I can name this field which is called "duration" but I don't know how to SUM it itself and put a key work "duration" front when output into a file.

This is the code I tried:

... // get duration Field 
// determine the word counts
Pipe pipe = new Pipe("pipe", docPipe);
pipe = new GroupBy(pipe, new Fields("duration"));
pipe = new Every(pipe, Fields.ALL, new Count(), Fields.ALL);

But it gives me the wrong output which I have shown above.

Maybe I shouldn't use Count, but I tried SumBy it still doesn't work. Can anyone help me?

Upvotes: 1

Views: 1137

Answers (1)

user1995521
user1995521

Reputation: 305

Since you want the sum over all values, i.e. you want just a single group, the "fields" parameter to GroupBy should be Fields.NONE. Also, since you're summing the duration field, you should make that the argument selector in Every. The following code does what you want:

pipe = new GroupBy(pipe, Fields.NONE);
pipe = new Every(pipe, new Fields("duration"), new Sum(), Fields.ALL);

Upvotes: 1

Related Questions