YNR
YNR

Reputation: 875

How to compute AIC for linear regression model in Python?

I want to compute AIC for linear models to compare their complexity. I did it as follows:

regr = linear_model.LinearRegression()
regr.fit(X, y)

aic_intercept_slope = aic(y, regr.coef_[0] * X.as_matrix() + regr.intercept_, k=1)

def aic(y, y_pred, k):
   resid = y - y_pred.ravel()
   sse = sum(resid ** 2)

   AIC = 2*k - 2*np.log(sse)

return AIC

But I receive a divide by zero encountered in log error.

Upvotes: 18

Views: 39482

Answers (2)

dragondip
dragondip

Reputation: 90

Here is an example implementation of AIC from the link that was given in the previous answer.

from statsmodels.regression.linear_model import OLS
from statsmodels.tools import add_constant
import numpy as np
from sklearn import linear_model

x = np.array([1,2,3,4,5])
y = np.array([0.3, 0.4, 0.4, 0.5, 0.6])

# 1 feature and constant
p = 1+1 

lr = linear_model.LinearRegression()
lr = lr.fit(x.reshape(-1, 1), y)
pr = lr.predict(x.reshape(-1, 1))

def llf_(y, X, pr):
    # return maximized log likelihood
    nobs = float(X.shape[0])
    nobs2 = nobs / 2.0
    nobs = float(nobs)
    resid = y - pr
    ssr = np.sum((resid)**2)
    llf = -nobs2*np.log(2*np.pi) - nobs2*np.log(ssr / nobs) - nobs2
    return llf

def aic(y, X, pr, p):
    # return aic metric
    llf = llf_(y, X, pr)
    return -2*llf+2*p

regr = OLS(y, add_constant(x.reshape(-1, 1))).fit()
print(regr.aic)
print(aic(y, x, pr, p))

Output:

-18.903519181693923 # OLS AIC
-18.903519181693916 # our AIC

Upvotes: 1

Brad Solomon
Brad Solomon

Reputation: 40888

sklearn's LinearRegression is good for prediction but pretty barebones as you've discovered. (It's often said that sklearn stays away from all things statistical inference.)

statsmodels.regression.linear_model.OLS has a property attribute AIC and a number of other pre-canned attributes.

However, note that you'll need to manually add a unit vector to your X matrix to include an intercept in your model.

from statsmodels.regression.linear_model import OLS
from statsmodels.tools import add_constant

regr = OLS(y, add_constant(X)).fit()
print(regr.aic)

Source is here if you are looking for an alternative way to write manually while still using sklearn.

Upvotes: 21

Related Questions