Runner Bean
Runner Bean

Reputation: 5195

Python Sklearn Logistic Regression Model Incorrect Fit

For logistic regression I am trying to reproduce the results from Wikipedia logistic regression page. So, my code looks like below:

import numpy as np
from sklearn.linear_model import LogisticRegression

x = np.array([0.5, 0.75, 1, 1.25, 1.5, 1.75, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 4, 4.25, 4.5, 4.75, 5, 5.5])
y = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1])

logistic = LogisticRegression()
logistic.fit(x[:, None], y)

But how to then obtain the summary of the fitted model, specifically something like this:

            Coefficient  Std.Error  z-value  P-value (Wald)
Intercept   −4.0777      1.7610     −2.316    0.0206
Hours        1.5046      0.6287      2.393    0.0167

This is what the Wikipedia page has for the fitted model. If I try to use printing of the coefficients and the intercept, I will receive something like:

print(logistic.coef_)
print(logistic.intercept_)

[[ 0.61126347]]

[-1.36550178]

Which is obviously different.

The question is, why do my results differ from the ones obtained on Wikipedia page?

Upvotes: 3

Views: 1209

Answers (2)

Matt Hancock
Matt Hancock

Reputation: 4039

The wikipedia example does not include regularization on the model parameters, but sklearn's LogisticRegression uses L2 regularization by default. Set the inverse regularization strength, C, to a very high value to use no regularization, e.g.,

logistic = LogisticRegression(penalty='l2', C=1e4)
logistic.fit(x[:, None],y)

print(logistic.coef_)
print(logistic.intercept_)

# [[ 1.50459727]]
# [-4.07757136]

Upvotes: 4

Kousik Krishnan
Kousik Krishnan

Reputation: 98

There exists no R type summary report in sklearn.

For classification tasks, there exists a function : sklearn.metrics.classification_report which calculates several types of (predictive) scores.

To have a R style summary report, take a look at the statsmodels library.

Upvotes: 3

Related Questions