ComputerFellow
ComputerFellow

Reputation: 12108

Count frequency of values in a column in PIG?

I have something like this:

ColA ColB
a    xxx
b    yyy
c    xxx
d    yyy
e    xxx

I need to find out the number of times each value of ColB occurs.

Output:

xxx 3
yyy 2

Here's what I've been trying:

Considering A has my data,

grunt> B = GROUP A by ColB;
grunt> DESCRIBE B;
B: {group: chararray,A: {(ColA: chararray,ColB: chararray)}}

Now I'm confused, do I do something like this?

grunt> C = FOREACH B GENERATE COUNT(B.ColB)

So I need the output to be like this,

xxx 3
yyy 2

Upvotes: 1

Views: 2817

Answers (2)

WinnieTran
WinnieTran

Reputation: 1

Use lower-case for 'group as', it works for me:

C = FOREACH B GENERATE group as ColB, COUNT(A) as count;

Upvotes: 0

ComputerFellow
ComputerFellow

Reputation: 12108

I figured it out.

C = FOREACH B GENERATE GROUP AS ColB, COUNT(A) as count;

Upvotes: 1

Related Questions