Python and SPSS giving different output for Logistic Regression

Code:

from sklearn.linear_model import LogisticRegression
l = LogisticRegression()
b = l.fit(XT,Y)
    print "coeff ",b.coef_
    print "intercept ",b.intercept_

Here's the dataset

XT =
[[23]
 [24]
 [26]
 [21]
 [29]
 [31]
 [27]
 [24]
 [22]
 [23]]
Y = [1 0 1 0 0 1 1 0 1 0]

Result:

coeff  [[ 0.00850441]]
intercept  [-0.15184511

Now I added the same data in spss.Analyse->Regression->Binary Logistic Regression. I set the corresponding Y -> dependent and XT -> Covariates. The results weren't even close. Am I missing something in python or SPSS? Python-Sklearn

Upvotes: 2

Answers (3)

Greg C

Reputation: 11

With sklearn you can also "turn off" the regularization by setting the penalty to None. Then, no regularization will be applied. This will provide similar results for the logistic regression in sklearn compared to SPSS.

An example of a logistic regression from sklearn with 1000 iterations and no penalty is:

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(max_iter=1000, penalty='none')

Upvotes: 1

Flyn Sequeira

Reputation: 692

Solved it myself. I tried changing the C-value in LinearRegression(C=100). That did the trick. C=1000 got the result closest to SPSS and textbook result.

Hope this helps anyone who face any problem with LogisticRegression in python.

Upvotes: 3

James Alvarez

Reputation: 7219

SPSS Logistic regression does not include parameter regularisation in it's cost function, it just does 'raw' logistic regression. In regularisation, the cost function includes a regularisation expression to prevent overfitting. You specify the inverse of this with the C value. If you set C to a very high value, it will closely mimic SPSS, so there is no magic number - just set it as high as you can, and there will be no regularisation.

Upvotes: 2

Python and SPSS giving different output for Logistic Regression

Answers (3)

Related Questions