John Sean
John Sean

Reputation: 37

Logistic regression confusion matrix problem

I tried computing for my model but I keep getting:

Error: data and reference should be factors with the same levels.

Below is my model:

model3 <- glm(winner ~ srs.1 + srs.2, data = train_set, family = binomial)
confusionMatrix(table(predict(model3, newdata=test_set, type="response")) >= 0.5,
                      train_set$winner == 1)

winner variable contains team1 and team2.
srs.1 and srs.2 are numerical values.

What is my problem here?

Upvotes: 3

Views: 138

Answers (1)

StupidWolf
StupidWolf

Reputation: 47008

I suppose your winner label is a binary of 0,1. So let's use the example below:

library(caret)
set.seed(111)
data = data.frame(
srs.1 = rnorm(200),
srs.2 = rnorm(200)
)

data$winner = ifelse(data$srs.1*data$srs.2 > 0,1,0)

idx = sample(nrow(data),150)
train_set = data[idx,]
test_set = data[-idx,]

model3 <- glm(winner ~ srs.1 + srs.2, data = train_set, family = binomial)

Like you did, we try to predict, if > 0.5, it will be 1 else 0. You got the table() about right. Note you need to do it both for test_set, or train_set:

pred = as.numeric(predict(model3, newdata=test_set, type="response")>0.5)
ref = test_set$winner

confusionMatrix(table(pred,ref))

Confusion Matrix and Statistics

    ref
pred  0  1
   0 12  5
   1 19 14

               Accuracy : 0.52            
                 95% CI : (0.3742, 0.6634)
    No Information Rate : 0.62            
    P-Value [Acc > NIR] : 0.943973        

                  Kappa : 0.1085  

Upvotes: 2

Related Questions