Create multiple confusion matrices in R using loops

Question

I am trying to create multiple confusion matrices from one dataframe, with each matrix generated based off a different condition in the dataframe.

So for the dataframe below, I want a confusion matrix for when Value = 1, Value = 2, Value =3

  observed predicted Value
       1      1      1
       0      1      1
       1      0      2
       0      0      2
       1      1      3
       0      0      3

and see the results like:

Value  Sensitivity  Specificity  PPV  NPV
1        .96            .71      .84  .95
2        .89            .63      .30  .45     
3        .88            .95      .28  .80

This is what I tried with a reproducible example. I am trying to write a loop that looks at every row, determines if Age = 1, and then pulls the values from the predicted and observed columns to generate a confusion matrix. Then I manually pull out the values from the confusion matrix to write out sen, spec, ppv, and npv and tried to combine all the matrices together. And then the loop starts again with Age = 2.

data(scat)
df<-scat %>% transmute(observed=ifelse(Site=="YOLA","case", "control"), predicted=ifelse(Location=="edge","case", "control"),Age)

x<-1 #evaluate at ages 1 through 5
for (i in dim(df)[1]) { #for every row in df
  while(x<6) { #loop stops at Age=5
    if(x=df$Age) {
      q<-confusionMatrix(data = df$predicted, reference = df$observed, positive = "case")
      sensitivity = q$table[1,1]/(q$table[1,1]+q$table[2,1])
      specificity = q$table[2,2]/(q$table[2,2]+q$table[1,2])
      ppv = q$table[1,1]/(q$table[1,1]+q$table[1,2])
      npv = q$table[2,2]/(q$table[2,2]+q$table[2,1])
      matrix(c(sensitivity, specificity, ppv, npv),ncol=4,byrow=TRUE)
    }
  }
  x <- x + 1 #confusion matrix at next Age value
}

final<- rbind(matrix) #combine all the matrices together

However, this loop is completely non-functional. I'm not sure where the error is.

Allan Cameron · Accepted Answer

Your code can be simplified and the desired output achieved like this:

library(caret)
library(dplyr)

data(scat)

df <- scat %>% 
  transmute(observed = factor(ifelse(Site == "YOLA","case", "control")), 
            predicted = factor(ifelse(Location == "edge","case", "control")),
            Age)

final <- t(sapply(sort(unique(df$Age)), function(i) { 
  
  q <- confusionMatrix(data      = df$predicted[df$Age == i],
                       reference = df$observed[df$Age == i], 
                       positive  = "case")$table
  
  c(sensitivity = q[1, 1] / (q[1, 1] + q[2, 1]),
    specificity = q[2, 2] / (q[2, 2] + q[1, 2]),
    ppv         = q[1, 1] / (q[1, 1] + q[1, 2]),
    npv         = q[2, 2] / (q[2, 2] + q[2, 1]))
}))

Resulting in

final
#>      sensitivity specificity        ppv       npv
#> [1,]         0.0   0.5625000 0.00000000 0.8181818
#> [2,]         0.0   1.0000000        NaN 0.8000000
#> [3,]         0.2   0.5882353 0.06666667 0.8333333
#> [4,]         0.0   0.6923077 0.00000000 0.6923077
#> [5,]         0.5   0.6400000 0.25000000 0.8421053

However, it's nice to know why your own code didn't work, so here are a few issues that might be useful to consider:

You need factor columns rather than character columns for confusionMatrix
You were incrementing through the rows of df, but you need one iteration for each unique age, not each row in your data frame.
Your line to increment x happens outside of the while loop, so x never increments and the loop never terminates, so the console just hangs.
You are doing if(x = df$Age), but you need a == to test equality.
It doesn't make sense to compare x to df$Age anyway, because x is length 1 and df$Age is a long vector.
You have unnecessary repetition by doing q$table each time. You can just make q equal to q$table to make your code more readable and less error-prone.
You call matrix at the end of the loop, but you don't store it anywhere, so the whole loop doesn't actually do anything.
You are trying to rbind an object called matrix in the last line which doesn't exist
Your lack of spaces between math operators, commas and variables make the code less readable and harder to debug. I'm not just saying this as a stylistic point; it is a major source of errors I see frequently here on SO.

Create multiple confusion matrices in R using loops

Answers (1)

Related Questions