Reputation: 33
I have the following data set for a movie database:
Ratings: UserID, MovieID, Rating :: Movies: MovieID, Title :: Users: UserID, Gender, Age
Now Ive joined ratings and users. The goal is to determine each movieID rating by genders F and M together. Also to include movies which have atleast 20 ratings by F and M.
data = JOIN myuser BY user, myrating BY user;
grouped_users = GROUP data BY (movie,gender);
Now after the grouped_users, I need to filter out movie which has less than 20 ratings both genders. How can I do that?
grouped_users_twenty = FILTER grouped_users BY SIZE(grouped_users)>=20;
This is my logic. Getting errors.
Upvotes: 0
Views: 32
Reputation: 11080
You have to use COUNT and not SIZE.
grouped_users_twenty = FOREACH grouped_users GENERATE group,COUNT(rating) as rating_count;
final = FILTER grouped_users_twenty BY rating_count >= 20;
Upvotes: 0
Reputation: 2478
data = JOIN myuser BY user, myrating BY user;
grouped_users = foreach (GROUP data BY (movie,gender)) {
generate
group.movie,
group.gender,
SIZE(data) as user_size
;
};
grouped_users_twenty = FILTER grouped_users BY user_size>=20;
Upvotes: 0