dibugger
dibugger

Reputation: 576

Apache Pig: group by and sum data

I have some data like:

type1, 2
type2, 1
type1, 3
type2, 4
type1, 5
type2, 3
type1, 1
type3, 5
type3, 5

I want to group them by type, the expected result should be:

type1, 11
type2, 8
type3, 10

Here is my PIG script:

data = LOAD 'my_data.txt' USING
   PigStorage(',') as (type:chararray, num:double);

a = GROUP data BY type;
result = foreach a generate data.type, SUM(data.num);

Dump result;

But I get this:

({(type1),(type1),(type1),(type1)},11.0)
({(type2),(type2),(type2)},8.0)
({(type3),(type3)},10.0)

How can I get rid of the multiple types in each record and only have one? Thanks a lot!

Upvotes: 1

Views: 2315

Answers (1)

dibugger
dibugger

Reputation: 576

Finally found that there is a keyword in PIG called group which can help achieve this. The modified code is:

data = LOAD 'my_data.txt' USING
PigStorage(',') as (type:chararray, num:double);

a = GROUP data BY type;
result = foreach a generate group, SUM(data.num);

Dump result;

Hope it helps.

Upvotes: 3

Related Questions