Reputation: 943
Consider the following code :
ebook = LOAD '$ebook' USING PigStorage AS (line:chararray);
ranked = RANK ebook;
The relation ranked
has two fields : the line number and the text. The text is called line
and can be referred to by this alias, but the line number generated by RANK
has none. As a consequence, the only way I can refer to it is as $0
.
How can I give $0
an name, so that I can refer to it more easily once it's been joined to another data set and is no longer $0
?
Upvotes: 1
Views: 1724
Reputation: 4106
What you want to do is to define a schema for you data. The easiest way to do so is to use the AS
keywoard just like you're doing with LOAD
.
You can define a schema with three operators : LOAD
, STREAM
and FOREACH
.
Here, the easiest way to do so would be the following :
ebook = LOAD '$ebook' USING PigStorage AS (line:chararray);
ranked = RANK ebook;
renamed_ranked = foreach B generate $0 as rank, $1;
You may find more informations on the associated documentation.
It is also good to know that this operation won't add an iteration to your script. As @ArnonRotem-Gal-Oz said :
Pig doesn't perform the action in a serial manner i.e. it doesn't do all the ranking and then does another iteration on all the records. The pig optimizer will do the rename when it assigns the rank. You can see a similar behaviour explained in the pig cookbook.
Upvotes: 2
Reputation: 25919
You can add a projection with FOREACH as
named_ranked = FOREACH ranked GENERATE $0 as r,*;
Upvotes: 1