learner28
learner28

Reputation: 3

Pig Data Types : Ordered Tuple v/s Unordered Bag

Quoting OReilly :

Tuple : An ordered collection of data elements. Bag : An unordered coection of tuples.

I am fairly new to pig, and this may be a trivial question, but I need help in understanding how a tuple is an "ordered" collection of elements, whereas a bag is not.

Thanks.

Upvotes: 0

Views: 544

Answers (2)

Ankur Singh
Ankur Singh

Reputation: 936

The Tuple defines as " ordered elements " and where as bag defines a " unordered tuple ".

You can understand using simple example -

Suppose there is a college , which has variety of branch i.e CSE,ME,EC,EI,EN etc. Each branch has HOD,Asst.Professor, Professor, Peon.

Tuple : The collection of details of each Branch i.e. there is a order in each branch such as first will be HOD,2nd will be .. ,3rd will be .. etc

Bag : The collection of Branch i.e. It doesn't contain any specific order ( unordered ).

Hope I am able to understand you.

Upvotes: 0

Ran Locar
Ran Locar

Reputation: 561

Think of the most simple example - a nicely formatted, unsorted CSV file.

When you read the file into PIG, each row is a tuple. A collection of fields. Each field has its position; it makes sense to speak of 'the first field', 'the 3rd field' and 'the last field'.

However, the order of those rows, is meaningless. Similiarly, the order of tuples in a bag, is arbitrary and cannot be relied on.

There is an interesting discussion of the concepts here: How do I extract the first tuple from a generated bag (whose size might vary) in PIG?

Upvotes: 1

Related Questions