Katrina
Katrina

Reputation: 409

Lagging/Differences in Pig

Is there a way to generate "column" lags or differences in Pig? Here's an example of what I'm trying to do, transcribed from R:

         Value           Value.Diff
1        0.209                NA
2        0.198            -0.011
3        0.187            -0.011
4        0.176            -0.011
5        0.168            -0.008
6        0.159            -0.009

I realize this might be tricky given the (presumably) distributed nature of Pig's data storage, but thought it might be possible given that Pig 0.11+ allows you to rank tuples.

Upvotes: 0

Views: 273

Answers (1)

Jonathan Packer
Jonathan Packer

Reputation: 586

Something like this should work:

values = rank values by some_field;
values = foreach values generate $0 as this_rank, $0 - 1 as prev_rank, value;
copy   = foreach values generate *;
pairs  = join values by this_rank, copy by prev_rank;
diffs  = foreach pairs generate this_rank, values::value - copy::value as diff;

Upvotes: 1

Related Questions