Reputation: 2772
I have this code where I want to center ratings given by users:
for u in users:
watched.loc[watched.user_id == u, 'rating'] -= watched.loc[watched.user_id == u, 'rating'].mean()
I have about 2000 users for a total of 200000 ratings. The code above takes about 20 secs.
If I try
watched.set_index('user_id', inplace=True)
then I have the error
ValueError: Must have equal len keys and value when setting with an iterable
Upvotes: 0
Views: 159
Reputation: 214927
Loop + filtering is a very slow approach; The canonical approach in pandas is group by the variable you want to split by and calculate the average for each group and then update in a single vectorized fashion; To keep the length, you can use groupby.transform
to calculate the mean
by user_id
and then subtract from rating
column:
watched.rating -= watched.rating.groupby(watched.user_id).transform('mean')
Upvotes: 1