Lucas
Lucas

Reputation: 1617

Combining tuples based on a field?

Say I have a structure like

{1001, {{id=1001, count=20, key=a}, {id=1001, count=30, key=b}}}
{1002, {{id=1002, count=40, key=a}, {id=1001, count=50, key=b}}}

And I want it transform it into

{id=1001, a=20, b=30}
{id=1002, a=40, b=50}

What Pig commands can I use to do this?

Upvotes: 0

Views: 1123

Answers (2)

alexeipab
alexeipab

Reputation: 3619

It looks like you are pivoting, similar to Pivoting in Pig. But you already have a bag of tuples. Doing inner join will be costly, as it will cause extra Map Reduce Jobs. To do it fast you need filtering within nested foreach. Modified code will look something like:

inpt = load '..../pig/bag_pivot.txt' as (id : int, b:bag{tuple:(id : int, count : int, key : chararray)});

result = foreach inpt {
    col1 = filter b by key == 'a';
    col2 = filter b by key == 'b';
    generate id, flatten(col1.count) as a, flatten(col2.count) as b;
};

Sample input data:

1001    {(1001,20,a),(1001,30,b)}
1002    {(1002,40,a),(1001,50,b)}

Output:

(1001,20,30)
(1002,40,50)

Upvotes: 1

Joe K
Joe K

Reputation: 18424

Not sure exactly what the format of your starting relation is, but to me it looks like (int, bag:{tuple:(int,int,chararray)})? If so, this should work:

flattened = FOREACH x GENERATE $0 AS id, flatten($1) AS (idx:int, count:int, key:chararray);
a = FILTER flattened BY key == 'a';
b = FILTER flattened BY key == 'b';
joined = JOIN a BY id, b BY id;
result = FOREACH joined GENERATE a::id AS id, a::count AS a, b::count AS b;

Upvotes: 1

Related Questions