student_R123
student_R123

Reputation: 1002

Regarding K fold cross validation in R

I created function this code to perform 5 fold cross validation for logistic regression.

  require(ISLR)
    folds <- cut(seq(1,nrow(Smarket)),breaks=5,labels=FALSE)



    log_cv=sapply(1:5,function(x)
    {
      set.seed(123)           

      testIndexes <- which(folds==x,arr.ind=TRUE)
      testData <- Smarket[testIndexes, ]
      trainData <- Smarket[-testIndexes, ]
      glm_log=glm(Direction ~ Lag1 + Lag2 + Lag3 + 
Lag4 + Lag5 + Volume ,family = "binomial",  data = trainData)
      glm.prob <- predict(glm_log, testData, "response")
      glm.pred <- ifelse(glm.prob >= 0.5, 1, 0)
      return(glm.pred)

    }
    )

The output of the above function gives the predicted values at each fold.

> head(log_cv)
  [,1] [,2] [,3] [,4] [,5]
1    1    1    0    1    1
2    0    1    1    1    1
3    0    1    1    1    0
4    1    1    0    1    1
5    1    1    1    1    1
6    1    1    1    0    1
> 

Is there any way to combine the above results to get the confusion matrix using 5 fold cross validation ?

Upvotes: 1

Views: 168

Answers (1)

RaphaelS
RaphaelS

Reputation: 869

A confusion matrix consists of the number of true-positives, false-positives, true-negatives, false-negatives. From cross-validation, you want the average of these over each fold. You have a matrix of predictions, log_cv which needs to be compared to your testData.

One way, although I'm sure someone else here will recommend tidyverse, is to turn your test data into a matrix:

truth <- matrix(testData$response, ncol = 5, nrow = nrow(testData))

Then use logical operators to find true positives, etc.:

True positives:

mean(apply(truth & testData, 2, sum))

True negatives:

mean(apply(!truth & !testData, 2, sum))

False positives:

mean(apply(truth & !testData, 2, sum))

False negatives:

mean(apply(!truth & testData, 2, sum))

Upvotes: 1

Related Questions