meenaparam
meenaparam

Reputation: 2019

pandas groupby - return the first row in a group that mets a condition

Given the example dataset below, I would like to return one row per group that shows the obsnum of the first row with a score less than 0.4.

import pandas as pd
import numpy as np

np.random.seed(42)

df = pd.DataFrame({'group': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'c'],
'obsnum': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
'score' : np.random.rand(12)})

The dataframe looks like this:

df
  group obsnum  score
0   a     1   0.374540
1   a     2   0.950714
2   a     3   0.731994
3   a     4   0.598658
4   b     1   0.156019
5   b     2   0.155995
6   b     3   0.058084
7   b     4   0.866176
8   c     1   0.601115
9   c     2   0.708073
10  c     3   0.020584
11  c     4   0.969910

The returned result should be like this and stored in another dataframe.

group   obsnum  score
a         1     0.374540
b         1     0.156019
c         3     0.020584

I have tried this df.groupby('group').apply(lambda x: x['score'] <= 0.4) and this df.groupby('group')['obsnum', 'score'].min() but neither are what I am after.

Upvotes: 2

Views: 1655

Answers (2)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

You can use

df[df['score'].le(0.4)].groupby('group').first()

     obsnum     score
group                  
a           1  0.374540
b           1  0.156019
c           3  0.020584

Upvotes: 6

jezrael
jezrael

Reputation: 862511

You can use boolean indexing or query for filter first and then drop_duplicates:

df = df[df['score'] <= 0.4].drop_duplicates('group')
df = df.query('score <= 0.4').drop_duplicates('group')

print (df)
   group  obsnum     score
0      a       1  0.374540
4      b       1  0.156019
10     c       3  0.020584

Upvotes: 3

Related Questions