Karim Lameer
Karim Lameer

Reputation: 183

Python statsmodel.api logistic regression (Logit)

So I'm trying to do a prediction using python's statsmodels.api to do logistic regression on a binary outcome. I'm using Logit as per the tutorials. When I try to do a prediction on a test dataset, the output is in decimals between 0 and 1 for each of the records. Shouldn't it be giving me zero and one? or do I have to convert these using a round function or something.

Excuse the noobiness of this question. I am staring my journey.

Upvotes: 4

Views: 15124

Answers (2)

user1544219
user1544219

Reputation:

If the response is on the unit interval interpreted as a probability, in addition to loss considerations, the other perspective which may help is looking at it as a Binomial outcome, as a count instead of a Bernoulli. In particular, in addition to the probabilistic response in your problem, is there any counterpart to numbers of trials in each case? If there were, then the logistic regression could be reexpressed as a Binomial (count) response, where the (integer) count would be the rounded expected value, obtained by product of the probability and the number of trials.

Upvotes: 0

Josef
Josef

Reputation: 22897

The predicted values are the probabilies given the explanatory variables, more precisely the probability of observing 1.

To get a 0, 1 prediction, you need to pick a threshold, like 0.5 for equal thresholding, and assign 1 to the probabilities above the threshold.

With numpy this would be for example

predicted = results.predict(x_for_prediction)
predicted_choice = (predicted > threshold).astype(int)

Upvotes: 5

Related Questions