Yeo Keat
Yeo Keat

Reputation: 143

How should I count how many users who have rated the specific movieId?

I would like to count how many users have rated the specific movieId? I have tried using pandas.iloc, but the result is still not as expected. The expected output is following:

For example, I using MovieLens data set, and let say movieId 302 actually have total 10 userId rated this specific movie.

The data is in dataframe. In your opinion what method should I try to get the expected result? I truly appreciate if I can learn from you. Thank you.

!wget "http://files.grouplens.org/datasets/movielens/ml-100k.zip"
!unzip ml-100k.zip
!ls

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv("ml-100k/u.data", sep='\t',names="userId,movieId,rating,timestamp".split(",")) 
data

Upvotes: 0

Views: 903

Answers (1)

Gorlomi
Gorlomi

Reputation: 515

Assuming that a single user cant rate on the same movie twice, to start you could try:

df.groupby('movieId')['userId'].count().reset_index(name='userIdCount')

(the reset_index() is to have it back as a dataframe)

you would then have:

    movieId userIdCount
0   1       5
1   2       1
2   3       2

If you want to make sure that no userId voted more than once you can also use:

df.groupby('movieId')['userId'].nunique().reset_index(name='userIdCount')

Upvotes: 1

Related Questions