Reputation: 83
Domain: Python & Pandas
I have a time series data frame which has the total number of customers for each day for the last 10 years.
The columns are:
There are outliers in my total customers column.
I wanted to reset the outliers outside of 3 standard deviations above the mean to a value as defined by the formula below.
Outlier which is above 3SD = Mean + 3 S.D.
Upvotes: 0
Views: 83
Reputation: 4855
You could use the .clip_upper()
method to limit values in the customers column to mean+3*sd.
m = df['total customers'].mean()
sd = df['total customers'].std()
df['total customers'] = df['total_customers'].clip_upper(m + 3*sd)
Here's the documentation for clip_upper
.
Upvotes: 1