Reputation: 77
I have the following dataframe:
ID Days TreatmentGiven TreatmentNumber
--- ---- -------------- ---------------
1 0 False NaN
1 30 False NaN
1 40 True 1.0
1 56 True 2.0
2 0 False NaN
2 14 True 4.0
2 28 True 5.0
3 0 False NaN
3 28 True 1.0
3 56 True 2.0
4 0 False NaN
4 100 True 6.0
4 128 True 7.0
I'd like to delete all rows for an ID where an ID does not have a TreatmentNumber == 1.0, so that the result is the following:
ID Days TreatmentGiven TreatmentNumber
--- ---- -------------- ---------------
1 0 False NaN
1 30 False NaN
1 40 True 1.0
1 56 True 2.0
3 0 False NaN
3 28 True 1.0
3 56 True 2.0
What is the best way to do this?
Thank you.
Upvotes: 2
Views: 123
Reputation: 294218
groupby
and filter
df.groupby('ID').filter(lambda d: d.TreatmentNumber.eq(1).any())
ID Days TreatmentGiven TreatmentNumber
0 1 0 False NaN
1 1 30 False NaN
2 1 40 True 1.0
3 1 56 True 2.0
7 3 0 False NaN
8 3 28 True 1.0
9 3 56 True 2.0
Upvotes: 2
Reputation: 150735
You can do with groupby().transform()
:
df[df.TreatmentNumber.eq(1).groupby(df['ID']).transform('any')]
Output:
ID Days TreatmentGiven TreatmentNumber
0 1 0 False NaN
1 1 30 False NaN
2 1 40 True 1.0
3 1 56 True 2.0
7 3 0 False NaN
8 3 28 True 1.0
9 3 56 True 2.0
Upvotes: 3
Reputation: 23099
we can use isin
and a boolean to find your ids
ids = df.loc[df['TreatmentNumber'].eq(1)]['ID'].unique()
df1 = df.loc[df['ID'].isin(ids)]
print(df1)
ID Days TreatmentGiven TreatmentNumber
0 1 0 False NaN
1 1 30 False NaN
2 1 40 True 1.0
3 1 56 True 2.0
7 3 0 False NaN
8 3 28 True 1.0
9 3 56 True 2.0
Upvotes: 0