Pumpkin C
Pumpkin C

Reputation: 1502

confusionMatrix for logistic regression in R

I want to calculate two confusion matrix for my logistic regression using my training data and my testing data:

logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))

i set the threshold of predicted probability at 0.5:

confusionMatrix(table(predict(logitMod, type="response") >= 0.5,
                      train$LoanStatus_B == 1))

And the the code below works well for my training set. However, when i use the test set:

confusionMatrix(table(predict(logitMod, type="response") >= 0.5,
                      test$LoanStatus_B == 1))

it gave me an error of

Error in table(predict(logitMod, type = "response") >= 0.5, test$LoanStatus_B == : all arguments must have the same length

Why is this? How can I fix this? Thank you!

Upvotes: 2

Views: 48153

Answers (1)

Damiano Fantini
Damiano Fantini

Reputation: 1975

I think there is a problem with the use of predict, since you forgot to provide the new data. Also, you can use the function confusionMatrix from the caret package to compute and display confusion matrices, but you don't need to table your results before that call.

Here, I created a toy dataset that includes a representative binary target variable and then I trained a model similar to what you did.

train <- data.frame(LoanStatus_B = as.numeric(rnorm(100)>0.5), b= rnorm(100), c = rnorm(100), d = rnorm(100))
logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))

Now, you can predict the data (for example, your training set) and then use confusionMatrix() that takes two arguments:

  • your predictions
  • the observed classes

library(caret)
# Use your model to make predictions, in this example newdata = training set, but replace with your test set    
pdata <- predict(logitMod, newdata = train, type = "response")

# use caret and compute a confusion matrix
confusionMatrix(data = as.numeric(pdata>0.5), reference = train$LoanStatus_B)

Here are the results

Confusion Matrix and Statistics

          Reference
Prediction  0  1
         0 66 33
         1  0  1

               Accuracy : 0.67            
                 95% CI : (0.5688, 0.7608)
    No Information Rate : 0.66            
    P-Value [Acc > NIR] : 0.4625          

Upvotes: 8

Related Questions