Reputation: 1002
I created function this code to perform 5 fold cross validation for logistic regression.
require(ISLR)
folds <- cut(seq(1,nrow(Smarket)),breaks=5,labels=FALSE)
log_cv=sapply(1:5,function(x)
{
set.seed(123)
testIndexes <- which(folds==x,arr.ind=TRUE)
testData <- Smarket[testIndexes, ]
trainData <- Smarket[-testIndexes, ]
glm_log=glm(Direction ~ Lag1 + Lag2 + Lag3 +
Lag4 + Lag5 + Volume ,family = "binomial", data = trainData)
glm.prob <- predict(glm_log, testData, "response")
glm.pred <- ifelse(glm.prob >= 0.5, 1, 0)
return(glm.pred)
}
)
The output of the above function gives the predicted values at each fold.
> head(log_cv)
[,1] [,2] [,3] [,4] [,5]
1 1 1 0 1 1
2 0 1 1 1 1
3 0 1 1 1 0
4 1 1 0 1 1
5 1 1 1 1 1
6 1 1 1 0 1
>
Is there any way to combine the above results to get the confusion matrix using 5 fold cross validation ?
Upvotes: 1
Views: 168
Reputation: 869
A confusion matrix consists of the number of true-positives, false-positives, true-negatives, false-negatives. From cross-validation, you want the average of these over each fold. You have a matrix of predictions, log_cv
which needs to be compared to your testData
.
One way, although I'm sure someone else here will recommend tidyverse, is to turn your test data into a matrix:
truth <- matrix(testData$response, ncol = 5, nrow = nrow(testData))
Then use logical operators to find true positives, etc.:
True positives:
mean(apply(truth & testData, 2, sum))
True negatives:
mean(apply(!truth & !testData, 2, sum))
False positives:
mean(apply(truth & !testData, 2, sum))
False negatives:
mean(apply(!truth & testData, 2, sum))
Upvotes: 1