Reputation: 576
I have some data like:
type1, 2
type2, 1
type1, 3
type2, 4
type1, 5
type2, 3
type1, 1
type3, 5
type3, 5
I want to group them by type, the expected result should be:
type1, 11
type2, 8
type3, 10
Here is my PIG script:
data = LOAD 'my_data.txt' USING
PigStorage(',') as (type:chararray, num:double);
a = GROUP data BY type;
result = foreach a generate data.type, SUM(data.num);
Dump result;
But I get this:
({(type1),(type1),(type1),(type1)},11.0)
({(type2),(type2),(type2)},8.0)
({(type3),(type3)},10.0)
How can I get rid of the multiple types in each record and only have one? Thanks a lot!
Upvotes: 1
Views: 2315
Reputation: 576
Finally found that there is a keyword in PIG called group
which can help achieve this. The modified code is:
data = LOAD 'my_data.txt' USING
PigStorage(',') as (type:chararray, num:double);
a = GROUP data BY type;
result = foreach a generate group, SUM(data.num);
Dump result;
Hope it helps.
Upvotes: 3