Reputation: 143
I would like to count how many users have rated the specific movieId? I have tried using pandas.iloc, but the result is still not as expected. The expected output is following:
For example, I using MovieLens data set, and let say movieId 302 actually have total 10 userId rated this specific movie.
The data is in dataframe. In your opinion what method should I try to get the expected result? I truly appreciate if I can learn from you. Thank you.
!wget "http://files.grouplens.org/datasets/movielens/ml-100k.zip"
!unzip ml-100k.zip
!ls
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("ml-100k/u.data", sep='\t',names="userId,movieId,rating,timestamp".split(","))
data
Upvotes: 0
Views: 903
Reputation: 515
Assuming that a single user cant rate on the same movie twice, to start you could try:
df.groupby('movieId')['userId'].count().reset_index(name='userIdCount')
(the reset_index() is to have it back as a dataframe)
you would then have:
movieId userIdCount
0 1 5
1 2 1
2 3 2
If you want to make sure that no userId voted more than once you can also use:
df.groupby('movieId')['userId'].nunique().reset_index(name='userIdCount')
Upvotes: 1