Reputation: 959
I have these data that I need to group by two columns and then sum up two other fields. Suppose the name for these four columns are:OS,device,view,click. I basically want to know the count for each OS and device, how many views they have and how many clicks it have.
(2,3346,1,)
(3,3953,1,1)
(25,4840,1,1)
(2,94840,1,1)
(14,0526,1,1)
(37,4864,1,)
(2,7353,1,)
This is what I have so far
A is data: OS,device,view,click
B = GROUP A BY (OS,device);
Result = FOREACH B {
GENERATE group AS OS,device, SUM(view) AS visits, SUM(click) AS clicks;};
dump Result;
This one won't work, error message is: Projected field [OS] does not exist in schema: group:tuple(OS:int,device:long),B:bag{:tuple(OS:int,device:long,view:int,click:int)}.
Upvotes: 0
Views: 1132
Reputation: 3966
Here is the code which is tested, you are missing FLATTEN:
A = LOAD '/user/root/pig_data' using PigStorage(',') AS (OS, device, view, click);
B = GROUP A BY (OS, device);
RESULT = FOREACH B GENERATE FLATTEN(group) AS (OS, device), SUM(A.view) as views, SUM(A.click) as clicks;
dump RESULT;
Upvotes: 1
Reputation: 2168
I think you meant B
in your example instead of J2
or J3
, which may be in your actual code. Try:
B = GROUP A BY (OS, device);
Result = FOREACH B GENERATE
group.OS AS OS:int,
group.device AS device:long,
SUM(B.view) AS visits:int,
SUM(B.click) AS clicks:int;
dump Result;
Upvotes: 0