Logit estimator in `statsmodels` and `sklearn`

Question

I'm pretty sure it's a feature, not a bug, but I would like to know if there is a way to make sklearn and statsmodels match in their logit estimates. A very simple example:

import numpy as np
import statsmodels.formula.api as sm
from sklearn.linear_model import LogisticRegression

np.random.seed(123)

n = 100
y = np.random.random_integers(0, 1, n)
x = np.random.random((n, 2))
# Constant term
x[:, 0] = 1.

The estimates with statsmodels:

sm_lgt = sm.Logit(y, x).fit()
    Optimization terminated successfully.
             Current function value: 0.675320
             Iterations 4
print sm_lgt.params
    [ 0.38442   -1.1429183]

And the estimates with sklearn:

sk_lgt = LogisticRegression(fit_intercept=False).fit(x, y)
print sk_lgt.coef_
    [[ 0.16546794 -0.72637982]]

I think it's got to do with the implementation in sklearn, which uses some sort of regularization. Is there an option to estimate a barebones logit as in statsmodels (it's substantially faster and scales much more nicely). Also, does sklearn provide inference (standard errors) or marginal effects?

Fred Foo · Accepted Answer

Is there an option to estimate a barebones logit as in statsmodels

You can set the C (inverse regularization strength) parameter to an arbitrarily high constant, as long as it's finite:

>>> sk_lgt = LogisticRegression(fit_intercept=False, C=1e9).fit(x, y)
>>> print(sk_lgt.coef_)
[[ 0.38440594 -1.14287175]]

Turning the regularization off is impossible because this is not supported by the underlying solver, Liblinear.

Also, does sklearn provide inference (standard errors) or marginal effects?

No. There's a proposal to add this, but it's not in the master codebase yet.

Logit estimator in `statsmodels` and `sklearn`

Answers (2)

Related Questions