Reputation: 10139
I have two sets of tuples and I want to inner join them by first element and merge other parts into one tuple, wondering how to implement this in Pig on Hadoop?
Input two tuple sets,
1,(1,2)
2,(2,3)
1,(b,c,b,c)
2,(c,d,c,d)
Expected output,
1,(1,2,b,c,b,c)
2,(2,3,c,d,c,d)
thanks in advance, Lin
Upvotes: 4
Views: 1688
Reputation: 2287
A thought worth contemplating ...
Inputs :
dataA :
1 (1,2)
2 (2,3)
dataB:
1 (b,c,b,c)
2 (c,d,c,d)
Pig Script :
A = LOAD 'dataA' USING PigStorage('\t') AS (aid:long, atuple : tuple(af1:long, af2:long));
B = LOAD 'dataB' USING PigStorage('\t') AS (bid:long, btuple : tuple(bf1:chararray, bf2:chararray, bf3:chararray, bf4:chararray));
C = JOIN A BY aid, B BY bid;
D = FOREACH C GENERATE aid AS id, FLATTEN(atuple) AS (af1:long, af2:long) , FLATTEN(btuple) AS (bf1:chararray, bf2:chararray, bf3:chararray, bf4:chararray);
E = FOREACH D GENERATE id, (af1..bf4);
DUMP E;
Output : DUMP E :
(1,(1,2,b,c,b,c))
(2,(2,3,c,d,c,d))
Upvotes: 1