zosh
zosh

Reputation: 83

Resetting outliers in a timeseries dataframe to 3 SD

Domain: Python & Pandas

I have a time series data frame which has the total number of customers for each day for the last 10 years.

The columns are:

There are outliers in my total customers column.

I wanted to reset the outliers outside of 3 standard deviations above the mean to a value as defined by the formula below.

Outlier which is above 3SD = Mean + 3 S.D.

Upvotes: 0

Views: 83

Answers (1)

Craig
Craig

Reputation: 4855

You could use the .clip_upper() method to limit values in the customers column to mean+3*sd.

m = df['total customers'].mean()
sd = df['total customers'].std()
df['total customers'] = df['total_customers'].clip_upper(m + 3*sd)

Here's the documentation for clip_upper.

Upvotes: 1

Related Questions