shr7
shr7

Reputation: 173

2 condition in pandas dataFrame indexes

I have this rows filtering:

idx = dataset.data.groupby(['a', 'b'])['c'].transform(max) == dataset.data['c']
dataset.data = dataset.data[idx]

For this dataset:

a | b | c | d
0 | 0 | 1 | True
0 | 0 | 2 | True
0 | 1 | 3 | True
0 | 2 | 4 | False
0 | 2 | 5 | False

I'll get:

a | b | c | d
0 | 0 | 1 | True
0 | 1 | 3 | True
0 | 2 | 5 | False

I want to add for this condition that removes only rows that their field 'd' is false, so in the above example I'll get:

a | b | c | d
0 | 0 | 1 | True
0 | 0 | 2 | True
0 | 1 | 3 | True
0 | 2 | 5 | False

Can someone help me add it, please?

Thanks!

Upvotes: 1

Views: 39

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35676

IIUC, keep rows where 'c' is the max or d is True:

import pandas as pd

df = pd.DataFrame({
    'a': [0, 0, 0, 0, 0],
    'b': [0, 0, 1, 2, 2],
    'c': [1, 2, 3, 4, 5],
    'd': [True, True, True, False, False]
})

# Max value of C
c1 = df.groupby(['a', 'b'])['c'].transform(max) == df['c']
# D is True
c2 = df.d
# Or together
idx = c1 | c2

print(df[idx])

df[idx]:

   a  b  c      d
0  0  0  1   True
1  0  0  2   True
2  0  1  3   True
4  0  2  5  False

The one-liner:

df[df.groupby(['a', 'b'])['c'].transform(max).eq(df['c']) | df['d']]

Upvotes: 2

Related Questions