Dervin Thunk
Dervin Thunk

Reputation: 20140

How do I join two stores in Apache Pig when key is divided into to columns?

I have two stored datasets:

A
ASDFGFG 5 7 8 9
B
ASDFG FG 5 7 8 9

I would like to join these two datasets by A1 and B1+2. I know that col 1 in Dataset A is equal to cols 1+2 in Dataset B. They are the same, but are split in B. I also know that B1 will always be 5 chars in length, though I cannot be sure that B2 will be 2 chars.

Preferably without modifying the source files, how do I perform such join?

Upvotes: 1

Views: 56

Answers (1)

nobody
nobody

Reputation: 11090

You can generate a new column in relation B using CONCAT(b1,b2) as b1_new and then join A and the new relation say B_New using that new column in B_New.Assuming your files are tab delimited

A = LOAD 'A.txt' USING PigStorage('\t') AS (a1:chararray,a2:int,a3:int,a4:int,a5:int);
B = LOAD 'B.txt' USING PigStorage('\t') AS (b1:chararray,b2:chararray,b3:int,b4:int,b5:int,b6:int);
B_New = FOREACH B GENERATE CONCAT(b1,b2) AS b1_new,b3,b4,b5,b6;
AB = JOIN A BY a1,B_New BY b1_new;
DUMP AB;

Upvotes: 3

Related Questions