Reputation: 4353
I am working with a book rating dataset of the form
userID | ISBN | Rating
23413 1232 2.5
12321 2311 3.2
23413 2532 1.7
23413 7853 3.8
Now I need to add a fourth column that contains the number of ratings each user has in the entire dataset:
userID | ISBN | Rating | Ratings_per_user
23413 1232 2.5 3
12321 2311 3.2 1
23413 2532 1.7 3
23413 7853 3.8 3
I have tried:
df_new['Ratings_per_user'] = df_new['userID'].value_counts()
but I get an error:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
and the entire new column is filled with NaN
.
Upvotes: 3
Views: 661
Reputation: 828
you can use map
:
df['Rating per user'] = df['userID'].map(df.groupby('userID')['Rating'].count())
print(df)
userID ISBN Rating Rating per user
0 23413 1232 2.5 3
1 12321 2311 3.2 1
2 23413 2532 1.7 3
3 23413 7853 3.8 3
Upvotes: 0
Reputation: 75080
Use:
df_new['Ratings_per_user']=df_new.groupby('userID')['userID'].transform('count')
userID ISBN rating Ratings_per_user
0 23413 1232 2.5 3
1 12321 2311 3.2 1
2 23413 2532 1.7 3
3 23413 7853 3.8 3
Upvotes: 1
Reputation: 13401
Convert result of value_counts
into dict
and then use replace
to create new column with user ratings
x = df['userID'].value_counts().to_dict()
df['rating_per_user'] = df['userID'].replace(x)
print(df)
Output:
userID ISBN rating rating_per_user
0 23413 1232 2.5 3
1 12321 2311 3.2 1
2 23413 2532 1.7 3
3 23413 7853 3.8 3
Upvotes: 1