Reputation: 864
I wrote some code to find the best fitting line for a couple of data points using the analytical solution to least squares. Now I would like to print the error between the actual data and my estimated line, but I have no idea how to compute it. Here is my code:
import numpy as np
import matplotlib.pyplot as plt
A = np.array(((0,1),
(1,1),
(2,1),
(3,1)))
b = np.array((1,2,0,3), ndmin = 2 ).T
xstar = np.matmul( np.matmul( np.linalg.inv( np.matmul(A.T, A) ), A.T), b)
print(xstar)
plt.scatter(A.T[0], b)
u = np.linspace(0,3,20)
plt.plot(u, u * xstar[0] + xstar[1], 'b-')
Upvotes: 1
Views: 4616
Reputation: 2905
Note that numpy
has a function for it, calles lstsq
(i.e. least-squares), that returns the residuals as well as the solution, so you don't have to implement it yourself:
xstar, residuals = np.linalg.lstsq(A,b)
MSE = np.mean(residuals)
SSE = np.sum(residuals)
try it!
Upvotes: 0
Reputation: 1350
You have already plotted the predictions from the linear regression. So from the value of the prediction, you can calculate the "sum of square errors (SSE)" or the "mean square error (MSE)" as follows:
y_prediction = u * xstar[0] + xstar[1]
SSE = np.sum(np.square(y_prediction - b))
MSE = np.mean(np.square(y_prediction - b))
print(SSE)
print(MSE)
An aside note. You might want to use np.linalg.pinv
as that is a more numerically stable matrix inverse operator.
Upvotes: 4