Reputation: 5195
For logistic regression I am trying to reproduce the results from Wikipedia logistic regression page. So, my code looks like below:
import numpy as np
from sklearn.linear_model import LogisticRegression
x = np.array([0.5, 0.75, 1, 1.25, 1.5, 1.75, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 4, 4.25, 4.5, 4.75, 5, 5.5])
y = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1])
logistic = LogisticRegression()
logistic.fit(x[:, None], y)
But how to then obtain the summary of the fitted model, specifically something like this:
Coefficient Std.Error z-value P-value (Wald)
Intercept −4.0777 1.7610 −2.316 0.0206
Hours 1.5046 0.6287 2.393 0.0167
This is what the Wikipedia page has for the fitted model. If I try to use printing of the coefficients and the intercept, I will receive something like:
print(logistic.coef_)
print(logistic.intercept_)
[[ 0.61126347]]
[-1.36550178]
Which is obviously different.
The question is, why do my results differ from the ones obtained on Wikipedia page?
Upvotes: 3
Views: 1209
Reputation: 4039
The wikipedia example does not include regularization on the model parameters, but sklearn's LogisticRegression
uses L2 regularization by default. Set the inverse regularization strength, C
, to a very high value to use no regularization, e.g.,
logistic = LogisticRegression(penalty='l2', C=1e4)
logistic.fit(x[:, None],y)
print(logistic.coef_)
print(logistic.intercept_)
# [[ 1.50459727]]
# [-4.07757136]
Upvotes: 4
Reputation: 98
There exists no R type summary report in sklearn.
For classification tasks, there exists a function : sklearn.metrics.classification_report which calculates several types of (predictive) scores.
To have a R style summary report, take a look at the statsmodels library.
Upvotes: 3