Reputation: 167
I have a pandas dataframe like the below:
price,date
14570,2/5/2017
14570,2/5/2017
14570.001,2/5/2017
14570.001,2/5/2017
14570.001,2/5/2017
14570.001,2/5/2017
25149.57,2/5/2017
24799.68,2/5/2017
24799.68,2/5/2017
14600,2/6/2017
14600,2/6/2017
2563000,2/6/2017
14600,2/6/2017
14800,2/6/2017
14800,2/6/2017
14600,2/6/2017
50,2/6/2017
14600,2/6/2017
14600,2/6/2017
I want to find and exclude outliers in each day. so i try to use the below snippet code:
order_items = order_items[np.abs(order_items['price']-order_items['price'].mean()) <= (3*order_items['price'].std())]
# keep only the ones that are within +3 to -3 standard deviations in the column 'Data'.
order_items = order_items[~(np.abs(order_items['price']-order_items['price'].mean()) > (3*order_items['price'].std()))]
The code find the outliers in all data not daily. How do i find and exclude outliers daily?
Upvotes: 0
Views: 27
Reputation: 863611
Use GroupBy.transform
with mean
and std
:
order_items = order_items[np.abs(order_items['price']-order_items.groupby('date')['price'].transform('mean')) <= (3*order_items.groupby('date')['price'].transform('std'))]
Upvotes: 1