Reputation: 199
I am very new about Cascading. Now I know how to do the word count using Cascading. Next I want to do some Sum operation. For example I have the following input:
a b c 1000
c d e 2000
a s e 5000
I want to SUM the last field. If I simply select that field and do COUNT it will give me the output like:
1000 1
2000 1
5000 1
It is not what I want!! I want to SUM all these 3 numbers and give it a name which is called "duration" which looks like:
duration 8000
I can name this field which is called "duration" but I don't know how to SUM it itself and put a key work "duration" front when output into a file.
This is the code I tried:
... // get duration Field
// determine the word counts
Pipe pipe = new Pipe("pipe", docPipe);
pipe = new GroupBy(pipe, new Fields("duration"));
pipe = new Every(pipe, Fields.ALL, new Count(), Fields.ALL);
But it gives me the wrong output which I have shown above.
Maybe I shouldn't use Count, but I tried SumBy it still doesn't work. Can anyone help me?
Upvotes: 1
Views: 1137
Reputation: 305
Since you want the sum over all values, i.e. you want just a single group, the "fields" parameter to GroupBy
should be Fields.NONE
. Also, since you're summing the duration field, you should make that the argument selector in Every
. The following code does what you want:
pipe = new GroupBy(pipe, Fields.NONE);
pipe = new Every(pipe, new Fields("duration"), new Sum(), Fields.ALL);
Upvotes: 1