Robert
Robert

Reputation: 159

Pandas - Finding rows where two or more meet a condition

For this example I have records for testing at different site locations.

df = pd.DataFrame({'test_id': [1,2,3,4,5,6,7,8, 9],
                   'name':['site1', 'site2', 'site3', 'site4', 'site1', 'site2', 'site3', 'site4', 'site5'],
                   'value':['elevated', 'ok', 'high', 'ok', 'elevated', 'elevated','elevated', 'elevated', 'ok']})


   test_id   name     value
0        1  site1  elevated
1        2  site4        ok
2        3  site2      high
3        4  site3        ok
4        5  site2  elevated
5        6  site1  elevated
6        7  site3        ok
7        8  site4  elevated
8        9  site5        ok


I need to flag rows where sites had 2 or more failed (non-ok) tests.

I approached this as needing grouping and counting and came up with the following:

df['flag'] = df.isin(df.loc[df['value'] != 'ok'].groupby(['name']).filter(lambda x: len(x) > 1))['value']

   test_id   name     value   flag
0        1  site1  elevated   True
1        2  site2        ok  False
2        3  site3      high   True
3        4  site4        ok  False
4        5  site1  elevated   True
5        6  site2  elevated  False
6        7  site3  elevated   True
7        8  site4  elevated  False
8        9  site5        ok  False

Is there a better way to approach this problem? I'm no expert in Pandas.

Upvotes: 3

Views: 37

Answers (1)

Erfan
Erfan

Reputation: 42916

You solution is fine! Although you can speed it up by getting rid of the lambda function and only use native vectorized methods.

So we first get a boolean array where all rows are not equal to ok. Then we groupby on name and get the sum for each group with transform. Finally we check which rows have a sum > 1 with Series.gt which stands for greater than eg >.

df['flag'] = df['value'].ne('ok').groupby(df['name']).transform('sum').gt(1)

   test_id   name     value   flag
0        1  site1  elevated   True
1        2  site2        ok  False
2        3  site3      high   True
3        4  site4        ok  False
4        5  site1  elevated   True
5        6  site2  elevated  False
6        7  site3  elevated   True
7        8  site4  elevated  False
8        9  site5        ok  False

Upvotes: 3

Related Questions