Phreakradio
Phreakradio

Reputation: 186

Pig reverses order of grouped tuples

I am running a pig script that flattens a bag, filters it, and groups it back together so that I can pass it to a UDF. The issue is that the last dates [before last value] are in DESC order, when I need them in ASC order

EX:

((RRRRRRR#,T2,19840101011),
{
  (RRRRRRR#,T2,19840101011,3.0,4.25,01-01-2014,31-03-2014,01-03-2014,20140101197),
  (RRRRRRR#,T2,19840101011,3.0,3.75,01-01-2014,31-03-2014,01-02-2014,20140101197),
  (RRRRRRR#,T2,19840101011,3.0,3.75,01-01-2014,31-03-2014,01-01-2014,20140101197)
})                                                         //^ REVERSED ORDER

What I need:

((RRRRRRR#,T2,19840101011),
{
  (RRRRRRR#,T2,19840101011,3.0,3.75,01-01-2014,31-03-2014,01-01-2014,20140101197),
  (RRRRRRR#,T2,19840101011,3.0,3.75,01-01-2014,31-03-2014,01-02-2014,20140101197),
  (RRRRRRR#,T2,19840101011,3.0,4.25,01-01-2014,31-03-2014,01-03-2014,20140101197)
})                                                         //^ NEED THIS ORDER

Is there a way to order tuples within a bag? If not, how can I prevent the GROUP command from sorting them this way?

Upvotes: 1

Views: 131

Answers (1)

reo katoa
reo katoa

Reputation: 5801

Bags are by definition unordered. Therefore you cannot guarantee that whatever transformations you do will preserve order. However, if at a certain step you need to guarantee an ordering, you can use ORDER BY inside a nested FOREACH:

b = FOREACH a {
        field2_ord = ORDER field2 BY date;
    GENERATE
        field1,
        field2_ord;
    };

Upvotes: 1

Related Questions