Reputation: 4827
I have a dataframe like:
df = pd.DataFrame({'year': [2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 2019],
'label': ['A', 'B', 'C', 'A', 'B', 'D', 'A', 'E', 'F', 'G', 'E', 'E'],
'cat' : [236, 546, 671, 555, 871, 229, 811, 992, 227, 341, 701, 508],
'value': [2, 5, 6, 1, 9, 4, 7, 8, 13, 11, 3, 12]})
And I would like to group this by year
and label
and return for each group the row with the lowest value for value
. The row has to include the (original) index value as well as all the columns (from df
).
So, the result should be like:
3 2018 A 555 1
1 2018 B 546 5
2 2018 C 671 6
5 2018 D 229 4
6 2019 A 811 7
10 2019 E 701 3
8 2019 F 227 13
9 2019 G 341 11
My code so far:
df.groupby(by=['year', 'label']).min().reset_index()
This has the correct values for value
, but not for cat
. Also the (original) index is missing.
Any suggestions how to solve this?
Upvotes: 1
Views: 1246
Reputation: 150765
You can do with idxmin()
:
df.loc[df.groupby(['year','label']).value.idxmin()]
Output:
year label cat value
3 2018 A 555 1
1 2018 B 546 5
2 2018 C 671 6
5 2018 D 229 4
6 2019 A 811 7
10 2019 E 701 3
8 2019 F 227 13
9 2019 G 341 11
Upvotes: 3
Reputation: 3739
1.First you need to find min
res = df.groupby(['Year','label'],as_index=False)['value'].min()
2.Then merge with original df
final_result = pd.merge(df[['Year','label','cat']],res,on=['Year','label'],how='inner')
Upvotes: 0
Reputation: 402603
This is a good indication that you want to sort and then call drop_duplicates, since groupby
destroys the original index.
# This is a little simpler but the order will change.
# df.sort_values('value').drop_duplicates(['year', 'label'])
df.sort_values(['year', 'label', 'value']).drop_duplicates(['year', 'label'])
year label cat value
3 2018 A 555 1
1 2018 B 546 5
2 2018 C 671 6
5 2018 D 229 4
6 2019 A 811 7
10 2019 E 701 3
8 2019 F 227 13
9 2019 G 341 11
Upvotes: 2