Pandas groupby on one column, aggregate on second column, preserve third column

Question

I have the following dataframe:

df = pd.DataFrame({'key1': (1,1,1,2), 'key2': (1,2,3,1), 'data1': ("test","test2","t","test")})

I want to group by key1 and have the min of data1. Further I want to preserve the according value of key2 without grouping on it.

df.groupby(['key1'], as_index=False)['data1'].min()

gets me:

key1 data1  
1    t  
2    test

but I need:

key1 key2 data1  
1    3    t  
2    1    test

Any ideas?

Nickil Maveli · Accepted Answer

You can make use of groupby.apply and retrieve all instances where x['data1']==x['data1'].min() equals to True while preserving the non-grouped columns as shown:

df.groupby('key1', group_keys=False).apply(lambda x: x[x['data1'].eq(x['data1'].min())])

To know what elements return True, from which we subset the reduced DF later:

df.groupby('key1').apply(lambda x: x['data1'].eq(x['data1'].min()))

key1   
1     0    False
      1    False
      2     True
2     3     True
Name: data1, dtype: bool

Pandas groupby on one column, aggregate on second column, preserve third column

Answers (1)

Related Questions