PhilHibbs
PhilHibbs

Reputation: 943

How to rename the fields in a relation

Consider the following code :

ebook = LOAD '$ebook' USING PigStorage AS (line:chararray);
ranked = RANK ebook;

The relation ranked has two fields : the line number and the text. The text is called line and can be referred to by this alias, but the line number generated by RANK has none. As a consequence, the only way I can refer to it is as $0.

How can I give $0 an name, so that I can refer to it more easily once it's been joined to another data set and is no longer $0?

Upvotes: 1

Views: 1724

Answers (2)

merours
merours

Reputation: 4106

What you want to do is to define a schema for you data. The easiest way to do so is to use the AS keywoard just like you're doing with LOAD.

You can define a schema with three operators : LOAD, STREAM and FOREACH. Here, the easiest way to do so would be the following :

ebook = LOAD '$ebook' USING PigStorage AS (line:chararray);
ranked = RANK ebook;
renamed_ranked = foreach B generate $0 as rank, $1;

You may find more informations on the associated documentation.

It is also good to know that this operation won't add an iteration to your script. As @ArnonRotem-Gal-Oz said :

Pig doesn't perform the action in a serial manner i.e. it doesn't do all the ranking and then does another iteration on all the records. The pig optimizer will do the rename when it assigns the rank. You can see a similar behaviour explained in the pig cookbook.

Upvotes: 2

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25919

You can add a projection with FOREACH as

named_ranked = FOREACH ranked GENERATE $0 as r,*;

Upvotes: 1

Related Questions