Ali
Ali

Reputation: 33

PIG: using conditions

I have the following data set for a movie database:

Ratings: UserID, MovieID, Rating :: Movies: MovieID, Title :: Users: UserID, Gender, Age

Now Ive joined ratings and users. The goal is to determine each movieID rating by genders F and M together. Also to include movies which have atleast 20 ratings by F and M.

data = JOIN myuser BY user, myrating BY user;
grouped_users = GROUP data BY (movie,gender);

Now after the grouped_users, I need to filter out movie which has less than 20 ratings both genders. How can I do that?

grouped_users_twenty = FILTER grouped_users BY SIZE(grouped_users)>=20;

This is my logic. Getting errors.

Upvotes: 0

Views: 32

Answers (2)

nobody
nobody

Reputation: 11080

You have to use COUNT and not SIZE.

grouped_users_twenty = FOREACH grouped_users GENERATE group,COUNT(rating) as rating_count;
final = FILTER grouped_users_twenty BY rating_count >= 20;

Upvotes: 0

Alexey
Alexey

Reputation: 2478

data = JOIN myuser BY user, myrating BY user;
grouped_users = foreach (GROUP data BY (movie,gender)) {
    generate
        group.movie,
        group.gender,
        SIZE(data) as user_size
    ;
};

grouped_users_twenty = FILTER grouped_users BY user_size>=20;

Upvotes: 0

Related Questions