Lucas
Lucas

Reputation: 439

Drop single column in Pig

I'm filtering a table by a list of about 20 IDs. Right now my code looks like this:

A = LOAD 'ids.txt' USING PigStorage();
B = LOAD 'massive_table' USING PigStorage();
C = JOIN A BY $0, B BY $0;
D = FOREACH C GENERATE $1, $2, $3, $4, ...
STORE D INTO 'foo' USING PigStorage();

What I don't like is line D, where I have to regenerate a new table to get rid of the joining column by explicitly declaring every single other column I want present (and sometimes that is a lot of columns). I'm wondering if there's something equivalent to:

FILTER B BY $0 IN (A)

or:

DROP $0 FROM C

Upvotes: 4

Views: 11053

Answers (2)

Dennis Jaheruddin
Dennis Jaheruddin

Reputation: 21563

Drop column by number

If you would want to drop column number 5, you could do it like so:

D = FOREACH C GENERATE .. $4, $6 .. ;

Drop column by name

If you want to drop a column by name, it does not appear possible by only knowing the name of the column that you want to drop. However, it is possible if you know the names of the columns directly before and after this column. If you want to drop the column(s) between colBeforeMyCol and colAfterMyCol, you could do it like so:

aliasAfter = FOREACH aliasBefore GENERATE 
             .. colBeforeMyCol, colAfterMyCol ..;

Upvotes: 3

Chris White
Chris White

Reputation: 30089

Maybe similiar-ish to this question:

That references a JIRA ticket: https://issues.apache.org/jira/browse/PIG-1693 which examples how you can use the .. notation to denote all the remaining fields:

D = FOREACH C GENERATE $1 .. ;

This assumes you have 0.9.0+ PIG

Upvotes: 9

Related Questions