warbaque
warbaque

Reputation: 633

pig script: join tables with null values

I'd like to join 2 tables, and I'm a bit lost with different kinds of joins

A(a_name:chararray, a_number:int)
a    1
b    2
c    
d    3
e    

B(b_id:int, b_name:chararray)
1    one
2    two
3    three

I know that I need to some sort of join, but with

AB = JOIN A by a_number, B by b_id;
FOREACH AB GENERATE
    a_name,
    b_name as a_number;

I get

a    one
b    two
d    three

Instead of

a    one
b    two
c    
d    three
e    

which I actually want. How should I do this?

edit:

Ok, I tried left join but it doesn't keep the row order and instead returns

a    one
b    two
d    three
c    
e    

Any workaround?

Upvotes: 1

Views: 1197

Answers (1)

mr2ert
mr2ert

Reputation: 5186

You are looking for a left JOIN.
                                                       A left join
This will keep all values on the left side of the relationship even if they don't appear in the right. Pig defaults to an inner JOIN, so it only keeps values that are in both sides.

This will now generate what you expect.

AB = JOIN A by a_number LEFT, B by b_id;
C = FOREACH AB GENERATE a_name, b_name AS a_number;

Also, you should be able to compact those two relations into:

AB = FOREACH (JOIN A by a_number LEFT, B by b_id)
     GENERATE a_name, b_name AS a_number;

As far as I know there is no option in JOIN to perverse the order of the left relation. However, you can RANK A beforehand then ORDER on the number RANK creates after the JOIN.

Upvotes: 3

Related Questions