Reputation: 409
Is there a way to generate "column" lags or differences in Pig? Here's an example of what I'm trying to do, transcribed from R:
Value Value.Diff
1 0.209 NA
2 0.198 -0.011
3 0.187 -0.011
4 0.176 -0.011
5 0.168 -0.008
6 0.159 -0.009
I realize this might be tricky given the (presumably) distributed nature of Pig's data storage, but thought it might be possible given that Pig 0.11+ allows you to rank tuples.
Upvotes: 0
Views: 273
Reputation: 586
Something like this should work:
values = rank values by some_field;
values = foreach values generate $0 as this_rank, $0 - 1 as prev_rank, value;
copy = foreach values generate *;
pairs = join values by this_rank, copy by prev_rank;
diffs = foreach pairs generate this_rank, values::value - copy::value as diff;
Upvotes: 1