Lin Ma
Lin Ma

Reputation: 10139

merge tuple in Pig

I have two sets of tuples and I want to inner join them by first element and merge other parts into one tuple, wondering how to implement this in Pig on Hadoop?

Input two tuple sets,

1,(1,2)
2,(2,3)

1,(b,c,b,c)
2,(c,d,c,d)

Expected output,

1,(1,2,b,c,b,c)
2,(2,3,c,d,c,d)

thanks in advance, Lin

Upvotes: 4

Views: 1688

Answers (1)

Murali Rao
Murali Rao

Reputation: 2287

A thought worth contemplating ...

Inputs :

dataA :

1   (1,2)
2   (2,3)

dataB:

1   (b,c,b,c)
2   (c,d,c,d)

Pig Script :

A = LOAD 'dataA'  USING  PigStorage('\t') AS  (aid:long, atuple : tuple(af1:long, af2:long));
B = LOAD 'dataB'  USING  PigStorage('\t') AS  (bid:long, btuple : tuple(bf1:chararray, bf2:chararray, bf3:chararray, bf4:chararray));
C = JOIN A BY aid, B BY bid;
D = FOREACH C GENERATE aid AS id, FLATTEN(atuple) AS (af1:long, af2:long) , FLATTEN(btuple) AS (bf1:chararray, bf2:chararray, bf3:chararray, bf4:chararray);
E = FOREACH D GENERATE id, (af1..bf4);
DUMP E;

Output : DUMP E :

(1,(1,2,b,c,b,c))
(2,(2,3,c,d,c,d))

Upvotes: 1

Related Questions