Reputation: 633
I'd like to join 2 tables, and I'm a bit lost with different kinds of joins
A(a_name:chararray, a_number:int)
a 1
b 2
c
d 3
e
B(b_id:int, b_name:chararray)
1 one
2 two
3 three
I know that I need to some sort of join, but with
AB = JOIN A by a_number, B by b_id;
FOREACH AB GENERATE
a_name,
b_name as a_number;
I get
a one
b two
d three
Instead of
a one
b two
c
d three
e
which I actually want. How should I do this?
edit:
Ok, I tried left join but it doesn't keep the row order and instead returns
a one
b two
d three
c
e
Any workaround?
Upvotes: 1
Views: 1197
Reputation: 5186
You are looking for a left JOIN
.
This will keep all values on the left side of the relationship even if they don't appear in the right. Pig defaults to an inner JOIN
, so it only keeps values that are in both sides.
This will now generate what you expect.
AB = JOIN A by a_number LEFT, B by b_id;
C = FOREACH AB GENERATE a_name, b_name AS a_number;
Also, you should be able to compact those two relations into:
AB = FOREACH (JOIN A by a_number LEFT, B by b_id)
GENERATE a_name, b_name AS a_number;
As far as I know there is no option in JOIN
to perverse the order of the left relation. However, you can RANK
A
beforehand then ORDER
on the number RANK
creates after the JOIN
.
Upvotes: 3