René
René

Reputation: 4827

python pandas groupby aggregate with min function

I have a dataframe like:

df = pd.DataFrame({'year': [2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 2019],
                   'label': ['A', 'B', 'C', 'A', 'B', 'D', 'A', 'E', 'F', 'G', 'E', 'E'],
                   'cat' : [236, 546, 671, 555, 871, 229, 811, 992, 227, 341, 701, 508],
                   'value': [2, 5, 6, 1, 9, 4, 7, 8, 13, 11, 3, 12]})

And I would like to group this by year and label and return for each group the row with the lowest value for value. The row has to include the (original) index value as well as all the columns (from df).

So, the result should be like:

 3  2018  A  555   1
 1  2018  B  546   5
 2  2018  C  671   6
 5  2018  D  229   4
 6  2019  A  811   7
10  2019  E  701   3
 8  2019  F  227  13
 9  2019  G  341  11

My code so far:

df.groupby(by=['year', 'label']).min().reset_index()

This has the correct values for value, but not for cat. Also the (original) index is missing.

Any suggestions how to solve this?

Upvotes: 1

Views: 1246

Answers (3)

Quang Hoang
Quang Hoang

Reputation: 150765

You can do with idxmin():

df.loc[df.groupby(['year','label']).value.idxmin()]

Output:

    year label  cat  value
3   2018     A  555      1
1   2018     B  546      5
2   2018     C  671      6
5   2018     D  229      4
6   2019     A  811      7
10  2019     E  701      3
8   2019     F  227     13
9   2019     G  341     11

Upvotes: 3

tawab_shakeel
tawab_shakeel

Reputation: 3739

1.First you need to find min
res = df.groupby(['Year','label'],as_index=False)['value'].min()

2.Then merge with original df final_result = pd.merge(df[['Year','label','cat']],res,on=['Year','label'],how='inner')

Upvotes: 0

cs95
cs95

Reputation: 402603

This is a good indication that you want to sort and then call drop_duplicates, since groupby destroys the original index.

# This is a little simpler but the order will change.
# df.sort_values('value').drop_duplicates(['year', 'label'])

df.sort_values(['year', 'label', 'value']).drop_duplicates(['year', 'label'])

    year label  cat  value
3   2018     A  555      1
1   2018     B  546      5
2   2018     C  671      6
5   2018     D  229      4
6   2019     A  811      7
10  2019     E  701      3
8   2019     F  227     13
9   2019     G  341     11

Upvotes: 2

Related Questions