Navneet
Navneet

Reputation: 9828

OUTER JOIN of multiple tables in Pig

I need to join multiple tables. The command that I am using is as follows:

G = JOIN aa BY f, bb by f, cc by f, dd by f;

To make it a full outer join, I added a FULL to make it:

G = JOIN aa BY f FULL, bb by f, cc by f, dd by f;

But it gives me a mismatched input error message. How should I make this work?

Thanks!

Upvotes: 2

Views: 10360

Answers (2)

hobgoblin
hobgoblin

Reputation: 935

You can use COGROUP statement to imitate full outer join. For example cogroup on using following two files

Decimal.csv

first|1
second|2
fourth|4

Roman.csv

first|I 
second|II
third|III

Pig commands:

english = LOAD 'Decimal.csv' using PigStorage('|') as (name:chararray,value:chararray);
roman = LOAD 'Roman.csv' using PigStorage('|') as (name:chararray, value:chararray);
multi = cogroup english by name, roman by name;
dump multi

Output:

(first,{(first,1)},{(first,I)})
(third,{},{(third,III)})
(fourth,{(fourth,4)},{})
(second,{(second,2)},{(second,II)})

Upvotes: 1

Lorand Bendig
Lorand Bendig

Reputation: 10650

According to the Pig documentation :

Outer joins will only work for two-way joins; to perform a multi-way outer join, you will need to perform multiple two-way outer join statements.

Upvotes: 7

Related Questions