How to find outliers in a given dataset using python

Question

coordinates = [(259, 168), (62, 133), (143, 163), (174, 270), (321, 385)]

slope = 0.76083799
intercept = 77.87127406

The coordinate with the brown marker is a potential outlier for me and thus need to be removed. As of now i am trying to use the student residual and jackknife residual to remove these outliers. However i am not able to calculate these residuals given the dataset that i have.

It would be really helpful if you people can help me in finding the residuals and how to do it as well in the above dataset.

CODE

import numpy as np
import matplotlib.pyplot as plt

coordinates = [(259, 168), (62, 133), (143, 163), (174, 270), (321, 385)]

x=[x1[0] for x1 in coordinates]
y=[x1[1] for x1 in coordinates]

for x1,y1 in coordinates:
   plt.plot(x1,y1,marker="o",color="brown")
plt.show()

# using numpy polyfit method to find regression line slope and intercept 
z = np.polyfit(x,y,1)
print(z)
slope = z[0]
intercept =z[1]

newx = np.linspace(62,321,200)
newy = np.poly1d(z)
plt.plot(x,y, 'o', newx, newy(newx),color="black")
# plt.plot()
plt.plot(259,168,marker="o",color="brown")
plt.show()

#TODO
#remove the outliers and then display

How to find outliers in a given dataset using python

Answers (1)

Related Questions