Reputation: 124
I am trying to create multiple confusion matrices from one dataframe, with each matrix generated based off a different condition in the dataframe.
So for the dataframe below, I want a confusion matrix for when Value = 1
, Value = 2
, Value =3
observed predicted Value
1 1 1
0 1 1
1 0 2
0 0 2
1 1 3
0 0 3
and see the results like:
Value Sensitivity Specificity PPV NPV
1 .96 .71 .84 .95
2 .89 .63 .30 .45
3 .88 .95 .28 .80
This is what I tried with a reproducible example. I am trying to write a loop that looks at every row, determines if Age = 1
, and then pulls the values from the predicted and observed columns to generate a confusion matrix. Then I manually pull out the values from the confusion matrix to write out sen, spec, ppv, and npv and tried to combine all the matrices together. And then the loop starts again with Age = 2
.
data(scat)
df<-scat %>% transmute(observed=ifelse(Site=="YOLA","case", "control"), predicted=ifelse(Location=="edge","case", "control"),Age)
x<-1 #evaluate at ages 1 through 5
for (i in dim(df)[1]) { #for every row in df
while(x<6) { #loop stops at Age=5
if(x=df$Age) {
q<-confusionMatrix(data = df$predicted, reference = df$observed, positive = "case")
sensitivity = q$table[1,1]/(q$table[1,1]+q$table[2,1])
specificity = q$table[2,2]/(q$table[2,2]+q$table[1,2])
ppv = q$table[1,1]/(q$table[1,1]+q$table[1,2])
npv = q$table[2,2]/(q$table[2,2]+q$table[2,1])
matrix(c(sensitivity, specificity, ppv, npv),ncol=4,byrow=TRUE)
}
}
x <- x + 1 #confusion matrix at next Age value
}
final<- rbind(matrix) #combine all the matrices together
However, this loop is completely non-functional. I'm not sure where the error is.
Upvotes: 0
Views: 685
Reputation: 174393
Your code can be simplified and the desired output achieved like this:
library(caret)
library(dplyr)
data(scat)
df <- scat %>%
transmute(observed = factor(ifelse(Site == "YOLA","case", "control")),
predicted = factor(ifelse(Location == "edge","case", "control")),
Age)
final <- t(sapply(sort(unique(df$Age)), function(i) {
q <- confusionMatrix(data = df$predicted[df$Age == i],
reference = df$observed[df$Age == i],
positive = "case")$table
c(sensitivity = q[1, 1] / (q[1, 1] + q[2, 1]),
specificity = q[2, 2] / (q[2, 2] + q[1, 2]),
ppv = q[1, 1] / (q[1, 1] + q[1, 2]),
npv = q[2, 2] / (q[2, 2] + q[2, 1]))
}))
Resulting in
final
#> sensitivity specificity ppv npv
#> [1,] 0.0 0.5625000 0.00000000 0.8181818
#> [2,] 0.0 1.0000000 NaN 0.8000000
#> [3,] 0.2 0.5882353 0.06666667 0.8333333
#> [4,] 0.0 0.6923077 0.00000000 0.6923077
#> [5,] 0.5 0.6400000 0.25000000 0.8421053
However, it's nice to know why your own code didn't work, so here are a few issues that might be useful to consider:
confusionMatrix
df
, but you need one iteration for each unique age, not each row in your data frame.x
happens outside of the while
loop, so x
never increments and the loop never terminates, so the console just hangs.if(x = df$Age)
, but you need a ==
to test equality.x
to df$Age
anyway, because x
is length 1 and df$Age
is a long vector.q$table
each time. You can just make q
equal to q$table
to make your code more readable and less error-prone.matrix
at the end of the loop, but you don't store it anywhere, so the whole loop doesn't actually do anything.rbind
an object called matrix
in the last line which doesn't existUpvotes: 1