Mini Fridge
Mini Fridge

Reputation: 939

Normalize values in DataFrame

What I need is normalize the rating column below by the following process:

  1. Group by user field id.
  2. Find mean rating for each user.
  3. Locate each users review tip and subtract the user's mean rating.

I have this data frame:

                user       rating
 review_id
         a      1          5
         b      2          3
         c      1          3
         d      1          4
         e      3          4
         f      2          2
...

I then calculate the mean for each user:

 >>>data.groupby('user').rating.mean()

 user
 1       4
 2       2.5
 3       4

I need the final result to be:

                user       rating
 review_id
         a      1          1
         b      2          0.5
         c      1          -1
         d      1          0
         e      3          0
         f      2          -0.5
...

How can dataframes provide this kind of functionality efficiently?

Upvotes: 2

Views: 1004

Answers (1)

joris
joris

Reputation: 139172

You can do this by using a groupby().transform(), see http://pandas.pydata.org/pandas-docs/stable/groupby.html#transformation

In this case, grouping by 'user', and then for each group subtract the mean of that group (the function you supply to transform is applied to each group, but the result keeps the original index):

In [7]: data.groupby('user').transform(lambda x: x - x.mean())
Out[7]:
           rating
review_id
a             1.0
b             0.5
c            -1.0
d             0.0
e             0.0
f            -0.5

Upvotes: 1

Related Questions