stack_lech
stack_lech

Reputation: 1080

Pandas groupby on one column, aggregate on second column, preserve third column

I have the following dataframe:

df = pd.DataFrame({'key1': (1,1,1,2), 'key2': (1,2,3,1), 'data1': ("test","test2","t","test")})

I want to group by key1 and have the min of data1. Further I want to preserve the according value of key2 without grouping on it.

df.groupby(['key1'], as_index=False)['data1'].min()

gets me:

key1 data1  
1    t  
2    test  

but I need:

key1 key2 data1  
1    3    t  
2    1    test  

Any ideas?

Upvotes: 2

Views: 639

Answers (1)

Nickil Maveli
Nickil Maveli

Reputation: 29711

You can make use of groupby.apply and retrieve all instances where x['data1']==x['data1'].min() equals to True while preserving the non-grouped columns as shown:

df.groupby('key1', group_keys=False).apply(lambda x: x[x['data1'].eq(x['data1'].min())])

enter image description here


To know what elements return True, from which we subset the reduced DF later:

df.groupby('key1').apply(lambda x: x['data1'].eq(x['data1'].min()))

key1   
1     0    False
      1    False
      2     True
2     3     True
Name: data1, dtype: bool

Upvotes: 2

Related Questions