Gal Hever
Gal Hever

Reputation: 1

Why predict function for logistic regression in r doesn't return binary vector?

I try to use logistic regression while the response variable is "Chan". I used predict function but the vector that the function bring back is not boolean, is anyone know what the problem?

example of my data:
x1 x2 x3 x4 Chan
3 4 5 6 1 1
4 4 4 4 1 1 
5 5 3 2 3 0
3 4 3 4 2 0

mimic_matrix$Chan<-(mimic$Chan==1)
training<-mimic_matrix[1:5000,]
test<-mimic_matrix[-(1:5000),-ncol(mimic_matrix)]
tag<-mimic_matrix[-(1:5000),ncol(mimic_matrix)]

mimic_regression <- glm(Chan ~ .,data = training,family = "binomial")
step_backward<-step(mimic_regression, direction="backward")

predict_backward<-predict(step_backward, newdata = test, type="response")
predict_backward<-(predict_backward==1)

Upvotes: 0

Views: 3762

Answers (3)

Raghavendra Bathula
Raghavendra Bathula

Reputation: 21

The output of logistic regression function glm() is a probability. But we can convert them to predictions (0 or 1) by using a threshold value. The threshold value selection is based on your preference on which errors are better. If you do not have a preference a 0.5 would be good. As Ken mentioned ROC Curve will help you to find a better threshold. You can install ROCR package for that.

Upvotes: 2

KenHBS
KenHBS

Reputation: 7174

A logistic regression gives an output between 0 and 1, which represents the probability that the dependent variable is equal to 1 (or TRUE, or whatever your dependent variable is). In most cases you would "predict" a value of 1 whenever the outcome of the logistic regression is larger than 0.5. However, it is dangerous to assume that 0.5 is the best cut off point, since the cost of misclassifying a TRUE as a FALSE mustn't be the same as the cost of misclassifying a FALSE as a TRUE. Think about the objective of your logistic regression classification problem and determine a suitable threshold (key word: ROC curve).

Upvotes: 4

admccurdy
admccurdy

Reputation: 724

Its returning the probability of each outcome given the covariates. From R's help:

the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale.

Upvotes: 1

Related Questions