Reputation: 1
For the following input:
(id:INT, val:INT, yr:INT);
(1 100 2014)
(1 100 2015)
(1 160 2016)
(2 95 2014)
(2 140 2015)
(2 110 2016)
(3 130 2016)
(4 140 2015)
(4 160 2016)
(5 60 2014)
For each yr, I need to find the highest val. I also need to include the corresponding id in the output. The output should also be sorted by yr in desc order.
OUTPUT should be:
yr id val
(2016 1 160)
(2016 2 160)
(2015 1 140)
(2015 3 140)
(2014 2 100)
PIG Latin Script:
LOAD data....
grpyr = GROUP data BY year;
maxperyr = FOREACH grpyr GENERATE group AS maxyr, MAX(data.val) AS maxval;
max = FILTER grpyr BY (data.val == maxperyr.maxval) AND (data.yr == maxperyr.maxyr);
The error is in the FILTER statement: incompatible types in Equal Operator left hand side:bag :tuple(amnt:int) right hand side:int I also tried filtering on the data table instead of grpyr but that did not work either.
Is there a better way to do this?
Thanks in advance!
Upvotes: 0
Views: 741
Reputation: 11090
Filter is not the right command for this case.The way to 'Filter' and get the desired id with max values for each year is through a JOIN.
maxperyr = FOREACH grpyr GENERATE group AS maxyr, MAX(data.val) AS maxval;
max_id_yr = JOIN maxperyr BY (maxyr,maxval),data BY (yr,val);
final = FOREACH max_id_yr GENERATE maxperyr::maxyr,data::id,maxperyr::maxval;
Upvotes: 1