Reputation: 13
So this is a simplified version of my full problem, but will hopefully help more people if it's more generic.
So I'm using a Pandas DataFrame that's arbitrarily big (large enough to not write a bunch of simple conditionals). Say it looks something like this:
member group score
1 1 56
1 1 432
1 1 43
2 1 44
2 1 555
2 2 90
2 2 101
And say this list goes on for quite a while. My goal is to compare only the score of rows where they have both the same member and group as another row, and take not only the max of those rows but also how much of a max it is and store it in a new data frame. For example, the finished data frame would look like:
member group max max by
1 1 432 376
2 1 555 511
2 2 101 11
I have no idea and I have not found any hint as to how to compare rows like that without saying df['member'==1]
, but there are too many different values for member and group for me to do this. Thank you in advance!
Upvotes: 0
Views: 68
Reputation: 863166
I think delete column first
is not necessary, cleaner is rename it after assign
subtracted columns:
df = (df.groupby(['member', 'group'])['score']
.agg(['max', 'first'])
.assign(first = lambda x: x['max'] - x['first'])
.rename(columns={'first':'max by'})
.reset_index())
print (df)
member group max max by
0 1 1 432 376
1 2 1 555 511
2 2 2 101 11
Upvotes: 0
Reputation: 323316
Using np.ptp
import pandas as pd
import numpy as np
df.groupby(['member','group'])['score'].agg({'max':'max','max by':np.ptp}).reset_index()
Out[8]:
member group max max by
0 1 1 432 389
1 2 1 555 511
2 2 2 101 11
EDIT : I will keep the "worng" one here :) cause I like this np.ptp
Here you go :~)
df.groupby(['member','group'])['score'].agg({'max':'max','max by':lambda g: g.max() - g.iloc[0]}).reset_index()
Out[17]:
member group max max by
0 1 1 432 376
1 2 1 555 511
2 2 2 101 11
Upvotes: 1
Reputation: 402814
Similar to DYZ's answer, a little cleaner.
df.groupby(['member', 'group']).score.agg(['max', 'first'])
df = df.assign(max_by=df.diff(-1, axis=1)['max'])\
.drop('first', 1).reset_index()
df
member group max max_by
0 1 1 432 376.0
1 2 1 555 511.0
2 2 2 101 11.0
Upvotes: 1
Reputation: 57075
As far as I understand, you want to know the max in each group and by how much the max is greater than the score in the first row of the group:
df1 = df.groupby(["group", "member"]).agg(["first", "max"]).reset_index()
df1.columns = "member", "group", "first", "max"
df1["max by"] = df1["max"] - df1["first"]
# member group first max max by
#0 1 1 56 432 376
#1 1 2 44 555 511
#2 2 2 90 101 11
Upvotes: 1