SaNa
SaNa

Reputation: 343

diffrence from the max in each group

I have a DataFrame like this:

df = pd.DataFrame({'id':['pt1','px1','t95','sx1','dc4','px5'],
'group':['f7','f7', 'f7','f8','f8','f8'],
'score':['2','3.3','4','8','4.9','6']})

I want to add another column and calculate the difference between each score in each group with the maximum score of that group. The expected result would be:

group id  score  score_diff
f7  pt1     2         -2
f7  px1   3.3        -.7
f7  t95     4          0
f8  sx1     8          0
f8  dc4   4.9        -3.1
f8  px5     6         -2

Would appreciate if you could please help. I want to run the code on 2000+ records. Below is my code but it gives me score difference from the previous record in each group. however, I want to calculate the score difference from max score in each group.

result = df.groupby(['fk'])['score'].diff()

Upvotes: 2

Views: 92

Answers (1)

Mohamed Ali JAMAOUI
Mohamed Ali JAMAOUI

Reputation: 14699

You can use groupby and transform to generate the max for each row of the dataframe, and then use simple substraction operation to get the desired output. Here's a working example.

In [1]: df
Out[1]: 
  group   id score
0    f7  pt1     2
1    f7  px1   3.3
2    f7  t95     4
3    f8  sx1     8
4    f8  dc4   4.9
5    f8  px5     6

In[2]: df['score'] = df.score.astype(float)

In[3]: df['score_diff'] = df.score - df.groupby('group')['score'].transform('max')

In[4]: df
Out[4]: 
  group   id  score  score_diff
0    f7  pt1    2.0        -2.0
1    f7  px1    3.3        -0.7
2    f7  t95    4.0         0.0
3    f8  sx1    8.0         0.0
4    f8  dc4    4.9        -3.1
5    f8  px5    6.0        -2.0

Note:

  • Please make sure that score column is a numeric (e.g: float), I used df.score.astype(float) to convert it to float.

Upvotes: 2

Related Questions