doopnubbie
doopnubbie

Reputation: 1

PIG FILTER command on max val in bag of tuples

For the following input:

(id:INT, val:INT, yr:INT);
(1   100    2014)
(1   100    2015)
(1   160    2016)
(2   95     2014)
(2   140    2015)
(2   110    2016)
(3   130    2016)
(4   140    2015)
(4   160    2016)
(5   60     2014)

For each yr, I need to find the highest val. I also need to include the corresponding id in the output. The output should also be sorted by yr in desc order.

OUTPUT should be:

yr   id val 
(2016 1  160)
(2016 2  160)
(2015 1  140)
(2015 3  140)
(2014 2  100)

PIG Latin Script:

LOAD data....
grpyr = GROUP data BY year;
maxperyr = FOREACH grpyr GENERATE group AS maxyr, MAX(data.val) AS maxval;
max = FILTER grpyr BY (data.val == maxperyr.maxval) AND (data.yr == maxperyr.maxyr);

The error is in the FILTER statement: incompatible types in Equal Operator left hand side:bag :tuple(amnt:int) right hand side:int I also tried filtering on the data table instead of grpyr but that did not work either.

Is there a better way to do this?

Thanks in advance!

Upvotes: 0

Views: 741

Answers (1)

nobody
nobody

Reputation: 11090

Filter is not the right command for this case.The way to 'Filter' and get the desired id with max values for each year is through a JOIN.

maxperyr = FOREACH grpyr GENERATE group AS maxyr, MAX(data.val) AS maxval;
max_id_yr = JOIN maxperyr BY (maxyr,maxval),data BY (yr,val);
final = FOREACH max_id_yr GENERATE maxperyr::maxyr,data::id,maxperyr::maxval; 

Upvotes: 1

Related Questions