PIG (Hadoop) - rows with variable columns

Question

Playing with Pig, my input file is:

1, 4, 6

1, 2, 7, 9

2, 5, 1

1, 3, 5, 1

2, 6, 2, 8

The first value in each row is the ID; the remainder of the row are simply unique values (each row can have a different number of columns).

I want to transpose the above into:

1, 2, 4, 6, 7, 9, 3, 5, 1

2, 5, 1, 6, 2, 8

So basically GROUP by ID, then flatten the rest of the columns and output that as each row.

Is PIG even the right approach here? I have a way to do this in M/R, but thought Pig might be ideal for this sort of thing.

Many thanks for any hints provided

Duncan

PS I do not care about the order of the values.

Answers (1)