Reputation: 1601
I have a PANDAS dataframe with the following structure for example:
id,sex,age,rank,skill
1,M,9,1,A
1,M,8,2,G
1,M,10,3,F
2,F,10,3,M
2,F,8,4,W
2,F,6,4,O
3,M,5,1,Q
3,M,4,3,N
3,M,9,4,Y
Where my desired output after the groupby/apply operation to the dataframe is:
id,sex,age,rank,skill
1,M,8,1,A
2,F,6,3,M
3,M,4,1,Q
In other words, I am looking to groupby the id field, sex field does not change, the min() of age value, the min() of rank value, and the skill value that was present at the the min() of rank value.
I understand that multiple agg functions can be passed to the groupby in a dict, but it how to handle the values that are constant or depend on the results of a function in another field of the groupby I do not understand.
Upvotes: 1
Views: 707
Reputation: 4345
+1 for Wen.
Mine has a few more steps but it's the same idea and perhaps easier to read if you're not following:
func = {'sex': 'min', 'age': 'min', 'rank': 'min'}
df_agg = df.groupby('id').agg(func)
df_agg = df_agg.reset_index()
df = df.drop('age', 1)
df = pd.merge(df_agg, df, on = ['id', 'sex', 'rank'])
Set the aggregations you want to apply to each column. Then group by id, using agg. You need to reset the index at this point or else you won't be able to perform the merge in the next step, as id will be treated as the index.
df still stores your original data-frame. Drop age from df, as you'll only need the minimized age, stored in df_agg. Then perform the merge on the columns you'd expect to match: id, sex and rank. You are merging on rank to pull the correct skill along for the ride.
Upvotes: 1
Reputation: 323396
In you expected out put , it is min
of rank
, but in your explanation you mentioned it is max
My answer base on you expected output
df.groupby(['id','sex'],as_index=False).agg({'age':'min','rank':'min'}).\
merge(df.drop('age',1),on=['id','sex','rank'],how='left')
Out[931]:
id sex age rank skill
0 1 M 8 1 A
1 2 F 6 3 M
2 3 M 4 1 Q
Upvotes: 3
Reputation:
For columns that have constant values, you have several options: first, last, etc. For the skill value that corresponds to the highest (or minimum in your example) rank value, you need to use idxmin. For idxmin to work, skill should be the index so as the first step set it as index.
df.set_index('skill').groupby('id').agg({'sex': 'first',
'age': 'min',
'rank': ['min', 'idxmin']})
Out:
sex age rank
first min min idxmin
id
1 M 8 1 A
2 F 6 3 M
3 M 4 1 Q
Upvotes: 1