proutray
proutray

Reputation: 2033

What is the difference between GROUP and COGROUP in PIG?

I understood Group didn't work with multiple tuples and hence we had COGROUP in PIG. However, while checking today the GROUP command works for me. I am using PIG-0.12.0. My commands and outputs are as follows.

grunt> grpvar = GROUP C by $2, B by $2;
grunt> cogrpvar = COGROUP C by $2, B by $2;
grunt> describe grpvar;

grpvar: {group: chararray,C: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)},B: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)}}

grunt> describe cogrpvar;

cogrpvar: {group: chararray,C: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)},B: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)}}

Is GROUP expected to work like this? What is the difference between GROUP and COGROUP them?

Upvotes: 6

Views: 17807

Answers (1)

Tibo R
Tibo R

Reputation: 364

Yes group is supposed to work like that !

According to the documentation ( http://pig.apache.org/docs/r0.12.0/basic.html#group ) :

Note: The GROUP and COGROUP operators are identical. Both operators work with one or more relations. For readability GROUP is used in statements involving one relation and COGROUP is used in statements involving two or more relations. You can COGROUP up to but no more than 127 relations at a time.

So it is just for readability, no differences between the two.

Upvotes: 8

Related Questions