Reputation: 1560
I have implemented a method to compute betas for OLS regression in python. Now, I would like to score my model using R^2. For my assignment, I am not allowed to use Python packages to do so, so would have to implement a method from scratch.
#load the data
import numpy as np
import pandas as pd
from numpy.linalg import inv
from sklearn.datasets import load_boston
boston = load_boston()
# Set the X and y variables.
X = boston.data
y = boston.target
#append ones to my X matrix.
int = np.ones(shape=y.shape)[..., None]
X = np.concatenate((int, X), 1)
#compute betas.
betas = inv(X.transpose().dot(X)).dot(X.transpose()).dot(y)
# extract the feature names of the boston data set and prepend the
#intercept
names = np.insert(boston.feature_names, 0, 'INT')
# collect results into a DataFrame for pretty printing
results = pd.DataFrame({'coeffs':betas}, index=names)
#print the results
print(results)
coeffs
INT 36.491103
CRIM -0.107171
ZN 0.046395
INDUS 0.020860
CHAS 2.688561
NOX -17.795759
RM 3.804752
AGE 0.000751
DIS -1.475759
RAD 0.305655
TAX -0.012329
PTRATIO -0.953464
B 0.009393
LSTAT -0.525467
Now, I would like to implement a R^2 to score my model on this data (or any other data). (see here: https://en.wikipedia.org/wiki/Coefficient_of_determination)
My issue is I am not entirely certain how to compute the numerator, SSE. In code it would look like this:
#numerator
sse = sum((Y - yhat ** 2)
Where Y are the boston house prices, and yhat are the predicted prices of these houses. However, how do I compute the term, yhat
?
Upvotes: 0
Views: 655
Reputation: 109626
yhat
is your estimate for a given observation. You can obtain all of your estimates simultaneously using the dot product via X.dot(betas)
.
Your sum square of errors would be the following (note the correction to version you gave: you need to square the difference, i.e. square the errors):
y_hat = X.dot(betas)
errors = y - y_hat
sse = (errors ** 2).sum()
Your total sum of squares:
tss = ((y - y.mean()) ** 2).sum()
And the resulting R-squared (coefficient of determination):
r2 = 1 - sse / tss
Also, I wouldn't use int
as a variable name to avoid clobbering the built in int
function (just call it ones
or const
instead).
Upvotes: 1