Reputation: 3
Quoting OReilly :
Tuple : An ordered collection of data elements. Bag : An unordered coection of tuples.
I am fairly new to pig, and this may be a trivial question, but I need help in understanding how a tuple is an "ordered" collection of elements, whereas a bag is not.
Thanks.
Upvotes: 0
Views: 544
Reputation: 936
The Tuple defines as " ordered elements " and where as bag defines a " unordered tuple ".
You can understand using simple example -
Suppose there is a college , which has variety of branch i.e CSE,ME,EC,EI,EN etc. Each branch has HOD,Asst.Professor, Professor, Peon.
Tuple : The collection of details of each Branch i.e. there is a order in each branch such as first will be HOD,2nd will be .. ,3rd will be .. etc
Bag : The collection of Branch i.e. It doesn't contain any specific order ( unordered ).
Hope I am able to understand you.
Upvotes: 0
Reputation: 561
Think of the most simple example - a nicely formatted, unsorted CSV file.
When you read the file into PIG, each row is a tuple. A collection of fields. Each field has its position; it makes sense to speak of 'the first field', 'the 3rd field' and 'the last field'.
However, the order of those rows, is meaningless. Similiarly, the order of tuples in a bag, is arbitrary and cannot be relied on.
There is an interesting discussion of the concepts here: How do I extract the first tuple from a generated bag (whose size might vary) in PIG?
Upvotes: 1