Labo
Labo

Reputation: 2772

How to speedup DataFrame.loc?

I have this code where I want to center ratings given by users:

for u in users:
    watched.loc[watched.user_id == u, 'rating'] -= watched.loc[watched.user_id == u, 'rating'].mean()

I have about 2000 users for a total of 200000 ratings. The code above takes about 20 secs.

If I try

watched.set_index('user_id', inplace=True)

then I have the error

ValueError: Must have equal len keys and value when setting with an iterable

Upvotes: 0

Views: 159

Answers (1)

akuiper
akuiper

Reputation: 214927

Loop + filtering is a very slow approach; The canonical approach in pandas is group by the variable you want to split by and calculate the average for each group and then update in a single vectorized fashion; To keep the length, you can use groupby.transform to calculate the mean by user_id and then subtract from rating column:

watched.rating -= watched.rating.groupby(watched.user_id).transform('mean')

Upvotes: 1

Related Questions