Reputation: 101
I have sample data as below:
(id,code,key,value)
1,A,p,10
2,B,q,20
3,B,p,30
3,B,q,20
3,C,t,60
3,C,q,20
After storing it into PIG, i need output like below:
O/P:
(A,{(p,10)})
(B,{(q,40),(p,30)})
(C,{(t,60)},(q,20))
We can drop id, and need output that adds the sum of all value that match with the key for specific code. in the above example we can see for code B- q,20 is twice, hence added and became q,40.
Below is my code but not able to get the exact output:
Lo = load 'pivot.txt' using PigStorage (',') as (id:chararray, code:chararray, key:chararray, value:int);
Aa = group L by (code);
Bb = foreach Aa {AUX = foreach Lo generate $0,$2,$3;generate group, AUX;}`
dump Bb:
(A,{(1,p,10)})
(B,{(3,q,20),(3,p,30),(2,q,20)})
(C,{(3,t,60),(3,q,20)})
I am not able to proceed further, help is much appreciated.
Thanks, Rohith
Upvotes: 3
Views: 227
Reputation: 2287
Pig Script :
input_data = LOAD 'input.csv' USING PigStorage(',') AS (id:int,code:chararray,key:chararray,value:int);
req_stats = FOREACH(GROUP input_data BY (code,key)) GENERATE FLATTEN(group) AS (code,key), SUM(input_data.value) AS value;
req_stats_fmt = FOREACH(GROUP req_stats BY code) GENERATE group AS code, req_stats.(key,value);
DUMP req_stats_fmt;
Input :
1,A,p,10
2,B,q,20
3,B,p,30
3,B,q,20
3,C,t,60
3,C,q,20
Output : DUMP req_stats_fmt
(A,{(p,10)})
(B,{(q,40),(p,30)})
(C,{(t,60),(q,20)})
Upvotes: 3