Reputation: 51
I created a Random Forest model in R. My outcome variable is "retain" with 1=retained and 0=left and I have the problem of case imbalance (many more in 0 than in 1) in the actual data, which is already shown in the confusion matrix for my training dataset. Based on my manual calculation, Sensitivity should be 0.05 and Specificity should be 0.67, which is consistent with the case imbalance problem. However, the numbers in the output are totally different. Below is the code and the output in console (rf is my Random Forest model):
retain_p <- rf %>%
predict(newdata = testing)
table(
actualclass = testing$retain,
predictedclass = retain_p
) %>%
confusionMatrix() %>%
print()
Confusion Matrix and Statistics
predictedclass
actualclass 0 1
0 1870 36
1 911 47
Accuracy : 0.6693
95% CI : (0.6518, 0.6866)
No Information Rate : 0.971
P-Value [Acc > NIR] : 1
Kappa : 0.039
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.67242
Specificity : 0.56627
Pos Pred Value : 0.98111
Neg Pred Value : 0.04906
Prevalence : 0.97102
Detection Rate : 0.65293
Detection Prevalence : 0.66550
Balanced Accuracy : 0.61934
'Positive' Class : 0
Upvotes: 0
Views: 1190
Reputation: 51
In confusionMatrix command, you have to specify what value goes into "positive".
retain_p <- rf %>%
predict(newdata = testing)
table(
actualclass = testing$retain,
predictedclass = retain_p
) %>%
confusionMatrix(positive='1') %>%
print()
Upvotes: 0