Green
Green

Reputation: 597

How Pig's COGROUP operator works?

How does the COGROUP operator works here? How and why we are getting empty bag in the last two lines of output(No website explained in details about the data arrangement in COGROUP) ?

A = load 'student' as (name:chararray, age:int, gpa:float);
B = load 'student' as (name:chararray, age:int, gpa:float);
dump B;
(joe,18,2.5)
(sam,,3.0)
(bob,,3.5)

X = cogroup A by age, B by age;
dump X;
(18,{(joe,18,2.5)},{(joe,18,2.5)})
(,{(sam,,3.0),(bob,,3.5)},{})
(,{},{(sam,,3.0),(bob,,3.5)})

Upvotes: 0

Views: 1210

Answers (1)

Imran
Imran

Reputation: 36

There is a very clear example in Definitive Guide book. I hope the below snippet helps you to understand the cogroup concept.

grunt> DUMP A;

(2,Tie) (4,Coat) (3,Hat) (1,Scarf)

grunt> DUMP B;

(Joe,2) (Hank,4) (Ali,0) (Eve,3) (Hank,2)

grunt> D = COGROUP A BY $0, B BY $1;

grunt> DUMP D;

(0,{},{(Ali,0)})

(1,{(1,Scarf)},{})

(2,{(2,Tie)},{(Joe,2),(Hank,2)})

(3,{(3,Hat)},{(Eve,3)})

(4,{(4,Coat)},{(Hank,4)})

COGROUP generates a tuple for each unique grouping key. The first field of each tuple is the key, and the remaining fields are bags of tuples from the relations with a matching key. The first bag contains the matching tuples from relation A with the same key. Similarly, the second bag contains the matching tuples from relation B with the same key.

If for a particular key a relation has no matching key, then the bag for that relation is empty. For example, since no one has bought a scarf (with ID 1), the second bag in the tuple for that row is empty. This is an example of an outer join, which is the default type for COGROUP.

Upvotes: 2

Related Questions