bibiji
bibiji

Reputation: 167

How can i find and exclude outliers in each day?

I have a pandas dataframe like the below:

price,date
14570,2/5/2017
14570,2/5/2017
14570.001,2/5/2017
14570.001,2/5/2017
14570.001,2/5/2017
14570.001,2/5/2017
25149.57,2/5/2017
24799.68,2/5/2017
24799.68,2/5/2017
14600,2/6/2017
14600,2/6/2017
2563000,2/6/2017
14600,2/6/2017
14800,2/6/2017
14800,2/6/2017
14600,2/6/2017
50,2/6/2017
14600,2/6/2017
14600,2/6/2017

I want to find and exclude outliers in each day. so i try to use the below snippet code:

order_items = order_items[np.abs(order_items['price']-order_items['price'].mean()) <= (3*order_items['price'].std())]
# keep only the ones that are within +3 to -3 standard deviations in the column 'Data'.

order_items = order_items[~(np.abs(order_items['price']-order_items['price'].mean()) > (3*order_items['price'].std()))]

The code find the outliers in all data not daily. How do i find and exclude outliers daily?

Upvotes: 0

Views: 27

Answers (1)

jezrael
jezrael

Reputation: 863611

Use GroupBy.transform with mean and std:

order_items = order_items[np.abs(order_items['price']-order_items.groupby('date')['price'].transform('mean')) <= (3*order_items.groupby('date')['price'].transform('std'))]

Upvotes: 1

Related Questions