Reputation: 186
I am running a pig script that flattens a bag, filters it, and groups it back together so that I can pass it to a UDF. The issue is that the last dates [before last value] are in DESC order, when I need them in ASC order
EX:
((RRRRRRR#,T2,19840101011),
{
(RRRRRRR#,T2,19840101011,3.0,4.25,01-01-2014,31-03-2014,01-03-2014,20140101197),
(RRRRRRR#,T2,19840101011,3.0,3.75,01-01-2014,31-03-2014,01-02-2014,20140101197),
(RRRRRRR#,T2,19840101011,3.0,3.75,01-01-2014,31-03-2014,01-01-2014,20140101197)
}) //^ REVERSED ORDER
What I need:
((RRRRRRR#,T2,19840101011),
{
(RRRRRRR#,T2,19840101011,3.0,3.75,01-01-2014,31-03-2014,01-01-2014,20140101197),
(RRRRRRR#,T2,19840101011,3.0,3.75,01-01-2014,31-03-2014,01-02-2014,20140101197),
(RRRRRRR#,T2,19840101011,3.0,4.25,01-01-2014,31-03-2014,01-03-2014,20140101197)
}) //^ NEED THIS ORDER
Is there a way to order tuples within a bag? If not, how can I prevent the GROUP command from sorting them this way?
Upvotes: 1
Views: 131
Reputation: 5801
Bags are by definition unordered. Therefore you cannot guarantee that whatever transformations you do will preserve order. However, if at a certain step you need to guarantee an ordering, you can use ORDER BY
inside a nested FOREACH
:
b = FOREACH a {
field2_ord = ORDER field2 BY date;
GENERATE
field1,
field2_ord;
};
Upvotes: 1