Normalize values in DataFrame

Question

What I need is normalize the rating column below by the following process:

Group by user field id.
Find mean rating for each user.
Locate each users review tip and subtract the user's mean rating.

I have this data frame:

                user       rating
 review_id
         a      1          5
         b      2          3
         c      1          3
         d      1          4
         e      3          4
         f      2          2
...

I then calculate the mean for each user:

 >>>data.groupby('user').rating.mean()

 user
 1       4
 2       2.5
 3       4

I need the final result to be:

                user       rating
 review_id
         a      1          1
         b      2          0.5
         c      1          -1
         d      1          0
         e      3          0
         f      2          -0.5
...

How can dataframes provide this kind of functionality efficiently?

joris · Accepted Answer

You can do this by using a groupby().transform(), see http://pandas.pydata.org/pandas-docs/stable/groupby.html#transformation

In this case, grouping by 'user', and then for each group subtract the mean of that group (the function you supply to transform is applied to each group, but the result keeps the original index):

In [7]: data.groupby('user').transform(lambda x: x - x.mean())
Out[7]:
           rating
review_id
a             1.0
b             0.5
c            -1.0
d             0.0
e             0.0
f            -0.5

Normalize values in DataFrame

Answers (1)

Related Questions