Reputation: 939
What I need is normalize the rating column below by the following process:
I have this data frame:
user rating
review_id
a 1 5
b 2 3
c 1 3
d 1 4
e 3 4
f 2 2
...
I then calculate the mean for each user:
>>>data.groupby('user').rating.mean()
user
1 4
2 2.5
3 4
I need the final result to be:
user rating
review_id
a 1 1
b 2 0.5
c 1 -1
d 1 0
e 3 0
f 2 -0.5
...
How can dataframes provide this kind of functionality efficiently?
Upvotes: 2
Views: 1004
Reputation: 139172
You can do this by using a groupby().transform()
, see http://pandas.pydata.org/pandas-docs/stable/groupby.html#transformation
In this case, grouping by 'user'
, and then for each group subtract the mean of that group (the function you supply to transform
is applied to each group, but the result keeps the original index):
In [7]: data.groupby('user').transform(lambda x: x - x.mean())
Out[7]:
rating
review_id
a 1.0
b 0.5
c -1.0
d 0.0
e 0.0
f -0.5
Upvotes: 1