djurikom
djurikom

Reputation: 109

Recycling relation names in Pig

Is it a good idea to recycle relation names in Pig? In particular, if I always use one and only one name throughout the script, such as:

relation1 = LOAD 'myfile.txt';
relation1 = FILTER relation1 BY ($1 > 0);
relation1 = GROUP relation1 BY $2;

is it and how exactly affecting the performance?

Upvotes: 1

Views: 185

Answers (1)

Sivasakthi Jayaraman
Sivasakthi Jayaraman

Reputation: 4724

This is definitely valid in Pig but its not recommended. I am pasting the information from the pig documentation link

It is possible to reuse relation names; for example, this is legitimate:

A = load 'NYSE_dividends' (exchange, symbol, date, dividends);
A = filter A by dividends > 0;
A = foreach A generate UPPER(symbol);

However, it is not recommended. It looks here as if you are reassigning A, but really you are creating new relations called A, losing track of the old relations called A. Pig is smart enough to keep up, but it still is not a good practice. It leads to confusion when trying to read your programs (which A am I referring to?) and when reading error messages.

Reference:
http://chimera.labs.oreilly.com/books/1234000001811/ch05.html#pl_general

Upvotes: 2

Related Questions