Reputation: 95
Problem: Pig Beginner- Below is my two input table
Table 1: Contain 3 columns (VID, TID and USID)
v1 TID101 US101
v2 TID102
v3 TID103
v4 TID104 US104
v5 US105
v6 US106
Table 2: Contain 3 columns (PID, TID, USID)
p1 TID101 US101
p2 TID102 US102
p3 TID103 US103
p4 TID104 US104
p5 TID105 US105
I would like to join table 1 and 2 and get output as below:
Expected Output:
v1 TID101 US101 p1
v2 TID102 p2
v3 TID103 p3
v4 TID104 US104 p4
v5 US105 p5
I tried inner join as below:
a= JOIN table1 BY (TID, USID), table2 BY (TID, USID);
b= FOREACH a GENERATE table1::vID, table1::TID, table1::USID, table2::PID;
But I get only below output:
Actual Output:
v1 TID101 US101 p1
v4 TID104 US104 p4
I could try left outer join but I feel that when I join by multiple keys, both keys are considered mandatory for joining and I cannot have "OR" condition. All I am trying is to get the PID from table2 if table1 record contain USID or TID. I am not sure what I miss and be interested to understand the best approach to arrive at the expected output. Please help!
Upvotes: 1
Views: 8637
Reputation: 11090
Join on single column,union the result and distinct the final relation.
PigScript
A = LOAD 'test1.txt' USING PigStorage('\t') as (a1:chararray,a2:chararray,a3:chararray);
B = LOAD 'test2.txt' USING PigStorage('\t') as (b1:chararray,b2:chararray,b3:chararray);
A2 = JOIN A BY (a2),B by (b2);
A3 = JOIN B BY (b3),A by (a3);
C = FOREACH A2 GENERATE A::a1,A::a2,A::a3,B::b1;
D = FOREACH A3 GENERATE A::a1,B::b2,B::b3,B::b1;
E = UNION C,D;
E1 = DISTINCT E;
DUMP E1;
Output
Upvotes: 3