Reputation: 345
I wrote a code to outlier dedection with Python. I used the z-score method to do this. You can see my data and my codes below.
data =[5,10,15,20,25,30,36,22]
data.append(180)
data = pd.DataFrame(data, columns = ["Data"])
z = np.abs(stats.zscore(data))
print(z)
print(np.where( z > 1.5))
I wrote this code to detect outliers. Actually, I wanted to getthe indices of values with z-score higher than 1.5. But I think something is wrong with output.
Data
0 0.649600
1 0.551506
2 0.453412
3 0.355318
4 0.257224
5 0.159130
6 0.041417
7 0.316080
8 2.783688
(array([8], dtype=int64), array([0], dtype=int64))
The 8th element of the data's z-score is higher than 1.5 and it's already written on output, I'm okay with this but the 0th's z-score 0.64. What am i doing wrong?
Upvotes: 0
Views: 1025
Reputation: 1413
You could do something like this:
import numpy as np
from scipy import stats
data =[5,10,15,20,25,30,36,22]
data.append(180)
z = stats.zscore(data)
np.where(z > 1.5)[0]
output:
array([8])
Upvotes: 2