ihadanny
ihadanny

Reputation: 4483

projecting all columns of a nested relation in pig

I have a relation MY_REL that is the result of a join of X and Y:

MY_REL = {X::x1,X::x2,Y::y1,Y::y2}

And I tried to do

Bla =       foreach MY_REL generate X;

Pig vomited:

ERROR 1000: Error during parsing. Scalars can be only used with projections

I tried X::* and it throws: invalid alias X.

The ugly workaround: I switched to explicitly writing all column names:

Bla =       foreach MY_REL generate X::x1, X::x2;

Is there a nice way to generate all X's columns?

Upvotes: 2

Views: 3154

Answers (2)

Romain
Romain

Reputation: 7082

If your columns before the JOIN all have a different name, you can just use them as is later:

Bla =       foreach MY_REL generate x1 + y1, x2 + y2;

If only one conflict you need to use the original prefix relation

Bla =       foreach MY_REL generate x1, x2, y1, Y:y1 AS y2;

An with the new Pig range PIG-1693

Bla =       foreach MY_REL generate ..$2, $3 AS y2;

And there is also some talks in PIG-2511

Upvotes: 1

Donald Miner
Donald Miner

Reputation: 39893

Instead of using JOIN, use COGROUP. COGROUP will create a relation that looks like {X : {x1, x2}, Y : {y1, y2}}. Therefore, you can do:

foreach MY_REL GENERATE FLATTEN(X);

Note that it is a bag in there, so you want to flatten it.

Upvotes: 3

Related Questions