Reputation: 183
So I'm trying to do a prediction using python's statsmodels.api to do logistic regression on a binary outcome. I'm using Logit as per the tutorials. When I try to do a prediction on a test dataset, the output is in decimals between 0 and 1 for each of the records. Shouldn't it be giving me zero and one? or do I have to convert these using a round function or something.
Excuse the noobiness of this question. I am staring my journey.
Upvotes: 4
Views: 15124
Reputation:
If the response is on the unit interval interpreted as a probability, in addition to loss considerations, the other perspective which may help is looking at it as a Binomial outcome, as a count instead of a Bernoulli. In particular, in addition to the probabilistic response in your problem, is there any counterpart to numbers of trials in each case? If there were, then the logistic regression could be reexpressed as a Binomial (count) response, where the (integer) count would be the rounded expected value, obtained by product of the probability and the number of trials.
Upvotes: 0
Reputation: 22897
The predicted values are the probabilies given the explanatory variables, more precisely the probability of observing 1.
To get a 0, 1 prediction, you need to pick a threshold, like 0.5 for equal thresholding, and assign 1 to the probabilities above the threshold.
With numpy this would be for example
predicted = results.predict(x_for_prediction)
predicted_choice = (predicted > threshold).astype(int)
Upvotes: 5