Reputation: 109
Is it a good idea to recycle relation names in Pig? In particular, if I always use one and only one name throughout the script, such as:
relation1 = LOAD 'myfile.txt';
relation1 = FILTER relation1 BY ($1 > 0);
relation1 = GROUP relation1 BY $2;
is it and how exactly affecting the performance?
Upvotes: 1
Views: 185
Reputation: 4724
This is definitely valid in Pig
but its not recommended. I am pasting the information from the pig documentation link
It is possible to reuse relation names
; for example, this is legitimate:
A = load 'NYSE_dividends' (exchange, symbol, date, dividends);
A = filter A by dividends > 0;
A = foreach A generate UPPER(symbol);
However, it is not recommended.
It looks here as if you are reassigning A, but really you are creating new relations called A, losing track of the old relations called A
. Pig is smart enough to keep up, but it still is not a good practice
. It leads to confusion when trying to read your programs (which A am I referring to?) and when reading error messages.
Reference:
http://chimera.labs.oreilly.com/books/1234000001811/ch05.html#pl_general
Upvotes: 2