Reputation: 173
I have this rows filtering:
idx = dataset.data.groupby(['a', 'b'])['c'].transform(max) == dataset.data['c']
dataset.data = dataset.data[idx]
For this dataset:
a | b | c | d
0 | 0 | 1 | True
0 | 0 | 2 | True
0 | 1 | 3 | True
0 | 2 | 4 | False
0 | 2 | 5 | False
I'll get:
a | b | c | d
0 | 0 | 1 | True
0 | 1 | 3 | True
0 | 2 | 5 | False
I want to add for this condition that removes only rows that their field 'd' is false, so in the above example I'll get:
a | b | c | d
0 | 0 | 1 | True
0 | 0 | 2 | True
0 | 1 | 3 | True
0 | 2 | 5 | False
Can someone help me add it, please?
Thanks!
Upvotes: 1
Views: 39
Reputation: 35676
IIUC, keep rows where 'c'
is the max
or d
is True
:
import pandas as pd
df = pd.DataFrame({
'a': [0, 0, 0, 0, 0],
'b': [0, 0, 1, 2, 2],
'c': [1, 2, 3, 4, 5],
'd': [True, True, True, False, False]
})
# Max value of C
c1 = df.groupby(['a', 'b'])['c'].transform(max) == df['c']
# D is True
c2 = df.d
# Or together
idx = c1 | c2
print(df[idx])
df[idx]
:
a b c d
0 0 0 1 True
1 0 0 2 True
2 0 1 3 True
4 0 2 5 False
The one-liner:
df[df.groupby(['a', 'b'])['c'].transform(max).eq(df['c']) | df['d']]
Upvotes: 2