kms
kms

Reputation: 2014

Detect and fix outliers in a pandas series

I have pandas series with some outliers values. Here's some mock data:

df = pd.DataFrame({'col1': [1200, 400, 50, 75, 8, 9, 8, 7, 6, 5, 4, 6, 6, 8, 3, 6, 6, 7, 6]}) 

I'd like to substitute outliers i.e values that >= 3 standard deviation from mean with the mean value.

Upvotes: 0

Views: 123

Answers (2)

haneulkim
haneulkim

Reputation: 4928

std_dev = df["col1"].std()
mean = df["col1"].mean()
df["col1"] = np.where(df.col1 >= (mean + 3*std_dev), mean, df.col1)

Upvotes: 1

kelvt
kelvt

Reputation: 1038

Let's do:

thrs = df['col1'].mean() + 3 * df['col1'].std()
df.loc[df['col1'] >= thrs, 'col1'] = df['col1'].mean()  

Upvotes: 1

Related Questions