How to output Regression Analysis summary from polynomial regression with scikit-learn?

Question

I currently have the following code, which does a polynomial regression on a dataset with 4 variables:

def polyreg():
    dataset = genfromtxt(open('train.csv','r'), delimiter=',', dtype='f8')[1:]   
    target = [x[0] for x in dataset]
    train = [x[1:] for x in dataset]
    test = genfromtxt(open('test.csv','r'), delimiter=',', dtype='f8')[1:]

    poly = PolynomialFeatures(degree=2)
    train_poly = poly.fit_transform(train)
    test_poly = poly.fit_transform(test)

    clf = linear_model.LinearRegression()
    clf.fit(train_poly, target)

    savetxt('polyreg_test1.csv', clf.predict(test_poly), delimiter=',', fmt='%f')

I wanted to know if there was a way to output a summary of the regression like in Excel ? I explored the attributes/methods of linear_model.LinearRegression() but couldn't find anything.

maxymoo · Accepted Answer

This is not implemented in scikit-learn; the scikit-learn ecosystem is quite biased towards using cross-validation for model evaluation (this a good thing in my opinion; most of the test statistics were developed out necessity before computers were powerful enough for cross-validation to be feasible).

For more traditional types of statistical analysis you can use statsmodels, here is an example taken from their documentation:

import numpy as np
import statsmodels.api as sm

nsample = 100
x = np.linspace(0, 10, 100)
X = np.column_stack((x, x**2))
beta = np.array([1, 0.1, 10])
e = np.random.normal(size=nsample)

X = sm.add_constant(X)
y = np.dot(X, beta) + e

model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 4.020e+06
Date:                Sun, 01 Feb 2015   Prob (F-statistic):          2.83e-239
Time:                        09:32:32   Log-Likelihood:                -146.51
No. Observations:                 100   AIC:                             299.0
Df Residuals:                      97   BIC:                             306.8
Df Model:                           2
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          1.3423      0.313      4.292      0.000         0.722     1.963
x1            -0.0402      0.145     -0.278      0.781        -0.327     0.247
x2            10.0103      0.014    715.745      0.000         9.982    10.038
==============================================================================
Omnibus:                        2.042   Durbin-Watson:                   2.274
Prob(Omnibus):                  0.360   Jarque-Bera (JB):                1.875
Skew:                           0.234   Prob(JB):                        0.392
Kurtosis:                       2.519   Cond. No.                         144.
==============================================================================

How to output Regression Analysis summary from polynomial regression with scikit-learn?

Answers (1)

Related Questions