sds
sds

Reputation: 60004

Default values for missing fields in a left join?

In Pig, when I do left join and a row does not have row, the values are NULL:

c = join a by ($0) left, b by ($0);

if

a=((1,10),(2,20))
b=((1,30))

then

c=((1,10,30),(2,20,NULL))

I want to use a default value (say, -1) instead of NULL so that

c=((1,10,30),(2,20,-1))

How do I do that?

If that is impossible, how do I change the 3rd column of c to have the default value instead of NULL?

Upvotes: 3

Views: 3372

Answers (1)

Ruslan
Ruslan

Reputation: 3273

I am not aware if that can be done within the join statement, but you add add another statement:

d = FOREACH c GENERATE $0, $1, (($2 IS NULL) ? -1 : $2);

I guess it won't trigger an additional MR job.

Upvotes: 6

Related Questions