Reputation: 1792
I have the following dataframe and then a calculation to identify outliers:
import pandas as pd
#Create DF
d = {
'Day': ['M','M','M','T','T','T','W','W','W','W','W','W','W','F','F','F','F'],
'Criteria':[30,35,2.5,3,6,15,20,1.5,4,25,50,75,2,4,9,2,36]
}
df = pd.DataFrame(data=d)
#Outliers calc
lower_limit = df.Criteria.quantile(.05)
upper_limit = df.Criteria.quantile(.95)
df['Flag'] = df.Criteria.between(lower_limit, upper_limit)
df
You can see that the Day
W
contails an outlier (2 in this case).
Essentially if a Flag
equals False
I would like to drop all rows from that Day
group.
So my expected result is all 'W'
Days are removed from the table. Thanks in advance!
Upvotes: 1
Views: 113
Reputation: 323226
Try this should be fast than filter
out = df[~df.Day.isin(df.loc[~df.Flag,'Day'])]
Out[20]:
Day Criteria Flag
0 M 30.0 True
1 M 35.0 True
2 M 2.5 True
3 T 3.0 True
4 T 6.0 True
5 T 15.0 True
13 F 4.0 True
14 F 9.0 True
15 F 2.0 True
16 F 36.0 True
Upvotes: 0
Reputation: 304
I think that's what you want :
import numpy as np
df = df[df.groupby('Day').Flag.transform(np.all)]
df
Upvotes: 1
Reputation: 18306
Groupby.filter
: retains those groups of Day
s that have their Flag
s all
True:
>>> df.groupby("Day").filter(lambda d: d.Flag.all())
Day Criteria Flag
0 M 30.0 True
1 M 35.0 True
2 M 2.5 True
3 T 3.0 True
4 T 6.0 True
5 T 15.0 True
13 F 4.0 True
14 F 9.0 True
15 F 2.0 True
16 F 36.0 True
Upvotes: 1