Reputation: 875
I want to compute AIC for linear models to compare their complexity. I did it as follows:
regr = linear_model.LinearRegression()
regr.fit(X, y)
aic_intercept_slope = aic(y, regr.coef_[0] * X.as_matrix() + regr.intercept_, k=1)
def aic(y, y_pred, k):
resid = y - y_pred.ravel()
sse = sum(resid ** 2)
AIC = 2*k - 2*np.log(sse)
return AIC
But I receive a divide by zero encountered in log
error.
Upvotes: 18
Views: 39482
Reputation: 90
Here is an example implementation of AIC from the link that was given in the previous answer.
from statsmodels.regression.linear_model import OLS
from statsmodels.tools import add_constant
import numpy as np
from sklearn import linear_model
x = np.array([1,2,3,4,5])
y = np.array([0.3, 0.4, 0.4, 0.5, 0.6])
# 1 feature and constant
p = 1+1
lr = linear_model.LinearRegression()
lr = lr.fit(x.reshape(-1, 1), y)
pr = lr.predict(x.reshape(-1, 1))
def llf_(y, X, pr):
# return maximized log likelihood
nobs = float(X.shape[0])
nobs2 = nobs / 2.0
nobs = float(nobs)
resid = y - pr
ssr = np.sum((resid)**2)
llf = -nobs2*np.log(2*np.pi) - nobs2*np.log(ssr / nobs) - nobs2
return llf
def aic(y, X, pr, p):
# return aic metric
llf = llf_(y, X, pr)
return -2*llf+2*p
regr = OLS(y, add_constant(x.reshape(-1, 1))).fit()
print(regr.aic)
print(aic(y, x, pr, p))
Output:
-18.903519181693923 # OLS AIC
-18.903519181693916 # our AIC
Upvotes: 1
Reputation: 40888
sklearn
's LinearRegression
is good for prediction but pretty barebones as you've discovered. (It's often said that sklearn stays away from all things statistical inference.)
statsmodels.regression.linear_model.OLS
has a property attribute AIC
and a number of other pre-canned attributes.
However, note that you'll need to manually add a unit vector to your X
matrix to include an intercept in your model.
from statsmodels.regression.linear_model import OLS
from statsmodels.tools import add_constant
regr = OLS(y, add_constant(X)).fit()
print(regr.aic)
Source is here if you are looking for an alternative way to write manually while still using sklearn
.
Upvotes: 21