Reputation: 20140
I have two stored datasets:
A
ASDFGFG 5 7 8 9
B
ASDFG FG 5 7 8 9
I would like to join these two datasets by A1 and B1+2. I know that col 1 in Dataset A is equal to cols 1+2 in Dataset B. They are the same, but are split in B. I also know that B1 will always be 5 chars in length, though I cannot be sure that B2 will be 2 chars.
Preferably without modifying the source files, how do I perform such join?
Upvotes: 1
Views: 56
Reputation: 11090
You can generate a new column in relation B using CONCAT(b1,b2) as b1_new and then join A and the new relation say B_New using that new column in B_New.Assuming your files are tab delimited
A = LOAD 'A.txt' USING PigStorage('\t') AS (a1:chararray,a2:int,a3:int,a4:int,a5:int);
B = LOAD 'B.txt' USING PigStorage('\t') AS (b1:chararray,b2:chararray,b3:int,b4:int,b5:int,b6:int);
B_New = FOREACH B GENERATE CONCAT(b1,b2) AS b1_new,b3,b4,b5,b6;
AB = JOIN A BY a1,B_New BY b1_new;
DUMP AB;
Upvotes: 3