Reputation: 1920
coordinates = [(259, 168), (62, 133), (143, 163), (174, 270), (321, 385)]
slope = 0.76083799
intercept = 77.87127406
The coordinate with the brown marker is a potential outlier for me and thus need to be removed. As of now i am trying to use the student residual and jackknife residual to remove these outliers. However i am not able to calculate these residuals given the dataset that i have.
It would be really helpful if you people can help me in finding the residuals and how to do it as well in the above dataset.
CODE
import numpy as np
import matplotlib.pyplot as plt
coordinates = [(259, 168), (62, 133), (143, 163), (174, 270), (321, 385)]
x=[x1[0] for x1 in coordinates]
y=[x1[1] for x1 in coordinates]
for x1,y1 in coordinates:
plt.plot(x1,y1,marker="o",color="brown")
plt.show()
# using numpy polyfit method to find regression line slope and intercept
z = np.polyfit(x,y,1)
print(z)
slope = z[0]
intercept =z[1]
newx = np.linspace(62,321,200)
newy = np.poly1d(z)
plt.plot(x,y, 'o', newx, newy(newx),color="black")
# plt.plot()
plt.plot(259,168,marker="o",color="brown")
plt.show()
#TODO
#remove the outliers and then display
Upvotes: 0
Views: 2185
Reputation: 892
x and y are placed into np.ndarrays
at the start.
Input:
import numpy as np
import matplotlib.pyplot as plt
coordinates = [(259, 168), (62, 133), (143, 163), (174, 270), (321, 385)]
x=np.array([x1[0] for x1 in coordinates]) #Placed into array
y=np.array([x1[1] for x1 in coordinates]) #Placed into array
for x1,y1 in coordinates:
plt.plot(x1,y1,marker="o",color="brown")
plt.show()
# using numpy polyfit method to find regression line slope and intercept
z = np.polyfit(x,y,1)
print(z)
slope = z[0]
intercept =z[1]
newx = np.linspace(62,321,200)
newy = np.poly1d(z)
plt.plot(x,y, 'o', newx, newy(newx),color="black")
# plt.plot()
plt.plot(259,168,marker="o",color="brown")
plt.show()
Additional code:
print("old y: " + repr(y)) #Display original array of y values
print("old x: " + repr(x))
residual_array = abs(y - (intercept + slope * x)) #Create an array of residuals
max_accept_deviation = 100 #An arbitrary value of "acceptable deviation"
mask = residual_array >= max_accept_deviation #Create an array of TRUE/FALSE values. TRUE where residual array is larger than deviation
rows_to_del = tuple(te for te in np.where(mask)[0]) #np.where converts the mask to a list of row numbers which is converted to a tuple
cleaned_y = np.delete(y,rows_to_del) #np.delete deletes all row numbers in the earlier tuple
cleaned_x = np.delete(x,rows_to_del)
print("new y: " + repr(cleaned_y)) #Print the cleaned values
print("new x: " + repr(cleaned_x))
Output:
[ 0.76083799 77.87127406]
old y: array([168, 133, 163, 270, 385])
old x: array([259, 62, 143, 174, 321])
new y: array([133, 163, 270, 385])
new x: array([ 62, 143, 174, 321])
Upvotes: 1