P McGuire
P McGuire

Reputation: 5

Apache Pig Distinct and Count

I'm trying to figure out the following question.

How many female users provided at least one rating of 4. I think my join and filters are correct but I cant figure out the distinct count part Have tried numerous versions of the below.

a = load '/user/pig/movie' AS (userid:int, movieid:int, rating:int, timestamp:chararray);
b = load '/user/pig/reviewer' using PigStorage('|') AS (userid:int, age:int, gender:chararray, occupation:chararray, zip:chararray);
a1 = filter a by rating == 4;
b1 = filter b by gender == 'F';
c = join a1 by userid, b1 by userid;
d = FOREACH c GENERATE COUNT(DISTINCT(userid));
dump d;

Upvotes: 0

Views: 824

Answers (1)

nobody
nobody

Reputation: 11080

You have to GROUP before COUNT.Ref:COUNT requires a preceding GROUP ALL statement for global counts and a GROUP BY statement for group counts.

d = GROUP c BY userid;
e = FOREACH d GENERATE COUNT(DISTINCT(b1.userid));
dump e;

Upvotes: 1

Related Questions