Eric
Eric

Reputation: 13

Selecting rows in a dataframe where columns match and taking max

So this is a simplified version of my full problem, but will hopefully help more people if it's more generic.

So I'm using a Pandas DataFrame that's arbitrarily big (large enough to not write a bunch of simple conditionals). Say it looks something like this:

member    group    score
1         1        56
1         1        432
1         1        43 
2         1        44
2         1        555
2         2        90
2         2        101

And say this list goes on for quite a while. My goal is to compare only the score of rows where they have both the same member and group as another row, and take not only the max of those rows but also how much of a max it is and store it in a new data frame. For example, the finished data frame would look like:

member    group    max    max by
1         1        432    376
2         1        555    511
2         2        101    11

I have no idea and I have not found any hint as to how to compare rows like that without saying df['member'==1], but there are too many different values for member and group for me to do this. Thank you in advance!

Upvotes: 0

Views: 68

Answers (4)

jezrael
jezrael

Reputation: 863166

I think delete column first is not necessary, cleaner is rename it after assign subtracted columns:

df = (df.groupby(['member', 'group'])['score']
       .agg(['max', 'first'])
       .assign(first = lambda x: x['max'] - x['first'])
       .rename(columns={'first':'max by'})
       .reset_index())
print (df)
   member  group  max  max by
0       1      1  432     376
1       2      1  555     511
2       2      2  101      11

Upvotes: 0

BENY
BENY

Reputation: 323316

Using np.ptp import pandas as pd import numpy as np

df.groupby(['member','group'])['score'].agg({'max':'max','max by':np.ptp}).reset_index()
Out[8]: 
   member  group  max  max by
0       1      1  432     389
1       2      1  555     511
2       2      2  101      11

EDIT : I will keep the "worng" one here :) cause I like this np.ptp

Here you go :~)

df.groupby(['member','group'])['score'].agg({'max':'max','max by':lambda g: g.max() - g.iloc[0]}).reset_index()
Out[17]: 
   member  group  max  max by
0       1      1  432     376
1       2      1  555     511
2       2      2  101      11

Upvotes: 1

cs95
cs95

Reputation: 402814

Similar to DYZ's answer, a little cleaner.

df.groupby(['member', 'group']).score.agg(['max', 'first'])
df = df.assign(max_by=df.diff(-1, axis=1)['max'])\
                         .drop('first', 1).reset_index()
df

   member  group  max  max_by
0       1      1  432   376.0
1       2      1  555   511.0
2       2      2  101    11.0

Upvotes: 1

DYZ
DYZ

Reputation: 57075

As far as I understand, you want to know the max in each group and by how much the max is greater than the score in the first row of the group:

df1 = df.groupby(["group", "member"]).agg(["first", "max"]).reset_index()
df1.columns = "member", "group", "first", "max"
df1["max by"] = df1["max"] - df1["first"]
#   member  group  first  max  max by
#0       1      1     56  432     376
#1       1      2     44  555     511
#2       2      2     90  101      11

Upvotes: 1

Related Questions