Reputation: 25
Preamble: the question I am going to ask can be considered a follow up of this discussion, for which a nice answer was provided. Also, I was given extremely helpful advice here, and the idea of what I am dealing with now goes into a similar direction.
I am creating a largely automated dashboard and, therefore, look for ways to generalise whenever possible. Here, I have a dataframe (in the long format, work mostly done with packages from the tidyverse) with
different methods (A, B, C, D, ...) called METHODEKURZ
two different outcome values (0, 1) pertaining to METHODEKURZ, I call them CLASS_INT
a set of comorbidities (COM1, COM2, COM3, COM4, COM5), sometimes more, sometimes less, called COMORB
two different outcome values (0, 1) pertaining to COMORB, I call them VALUES
Based on this information, I would like to obtain an output that looks like this:
METHODEKURZ | COMORB | Sensitivity | Specificity | PPV | NPV |
---|---|---|---|---|---|
A | COM1 | 0.49 | 0.22 | 0.31 | 0.11 |
B | COM1 | 0.31 | 0.22 | 0.22 | 0.49 |
C | COM1 | 0.22 | 0.49 | 0.31 | 0.22 |
D | COM1 | 0.49 | 0.22 | 0.31 | 0.11 |
A | COM2 | 0.22 | 0.22 | 0.49 | 0.11 |
B | COM2 | 0.49 | 0.22 | 0.31 | 0.22 |
C | COM2 | 0.31 | 0.22 | 0.31 | 0.22 |
D | COM2 | 0.31 | 0.22 | 0.31 | 0.49 |
If the question was solely to provide such an output with variable METHODEKURZ, the approach shown here and rendered below would be adequate and has shown to work well:
library(tidyverse)
my_df <- structure(
list(
a = c('A','B','C','D','A','B','C','D','A','B','C','D','A','B','C','D','A','B','C','D'),
b = c(0,0,1,1,0,1,0,0,0,0,1,1,1,1,1,1,1,1,1,0),
c = c('COM1','COM1','COM1','COM1','COM2','COM2','COM2','COM2','COM3','COM3','COM3','COM3', 'COM4','COM4','COM4','COM4','COM5','COM5','COM5','COM5'),
d = c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1,1,0,1,1)
),
.Names = c("METHODEKURZ", "CLASS_INT", "COMORB", "VALUES"),
row.names = c(NA, 20L),
class = "data.frame") %>%
mutate(across(c(contains('VALUES')),
~as.factor(.))) %>%
mutate(across(c(contains('CLASS_INT')),
~as.factor(.)))
t(sapply(sort(unique(my_df$METHODEKURZ)), function(i) {
q <- confusionMatrix(data = my_df$CLASS_INT[my_df$METHODEKURZ == i],
reference = my_df$VALUES[my_df$METHODEKURZ == i])$table
c(sensitivity = q[1, 1] / (q[1, 1] + q[2, 1]),
specificity = q[2, 2] / (q[2, 2] + q[1, 2]),
ppv = q[1, 1] / (q[1, 1] + q[1, 2]),
npv = q[2, 2] / (q[2, 2] + q[2, 1]))
}))
However, I have COMORB as an additional variable, which I would love to be taken into consideration. Could anybody help me modify the code in a way to include COMORB as a variable? I will use the output as a table but will likely also invest some time into finding a good way to visualise it. Thanks a lot for all your help in advance.
Upvotes: 1
Views: 222
Reputation: 586
Store each combination of variables into a data frame using expand.grid
and compute the statistics using the values corresponding to each individual set of variables.
library(caret)
# Generate all the combinations of variables using expand.grid
var_combinations <- expand.grid("METHODEKURZ" = unique(my_df$METHODEKURZ),
"COMORB" = unique(my_df$COMORB))
cbind(var_combinations, t(apply(var_combinations, 1, function(i) {
set_of_rows <- my_df$METHODEKURZ == i[1] & my_df$COMORB == i[2]
q <- confusionMatrix(data = my_df$CLASS_INT[set_of_rows],
reference = my_df$VALUES[set_of_rows])$table
c(sensitivity = q[1, 1] / (q[1, 1] + q[2, 1]),
specificity = q[2, 2] / (q[2, 2] + q[1, 2]),
ppv = q[1, 1] / (q[1, 1] + q[1, 2]),
npv = q[2, 2] / (q[2, 2] + q[2, 1]))
})))
# METHODEKURZ COMORB sensitivity specificity ppv npv
#1 A COM1 1.0000000 0.6666667 0.6666667 1.0000000
#2 B COM1 1.0000000 0.2500000 0.2500000 1.0000000
#3 C COM1 0.3333333 0.5000000 0.5000000 0.3333333
#4 D COM1 0.0000000 0.3333333 0.0000000 0.3333333
#5 A COM2 1.0000000 0.0000000 0.6000000 NaN
#6 B COM2 0.0000000 0.5000000 0.0000000 0.6666667
#7 C COM2 1.0000000 0.5000000 0.3333333 1.0000000
#8 D COM2 0.2500000 0.0000000 0.5000000 0.0000000
#9 A COM3 0.5000000 0.0000000 0.2500000 0.0000000
#10 B COM3 1.0000000 0.2500000 0.2500000 1.0000000
#11 C COM3 0.3333333 0.5000000 0.5000000 0.3333333
#12 D COM3 0.5000000 0.0000000 0.6666667 0.0000000
#13 A COM4 0.6666667 0.0000000 0.5000000 0.0000000
#14 B COM4 1.0000000 0.5000000 0.3333333 1.0000000
#15 C COM4 1.0000000 1.0000000 1.0000000 1.0000000
#16 D COM4 0.5000000 0.3333333 0.3333333 0.5000000
#17 A COM5 0.5000000 1.0000000 1.0000000 0.3333333
#18 B COM5 0.0000000 0.7500000 0.0000000 0.7500000
#19 C COM5 1.0000000 0.6666667 0.6666667 1.0000000
#20 D COM5 0.5000000 0.0000000 0.6666667 0.0000000
Raw data
I generated more values to get several observations for each combination of variables.
library(dplyr)
#For reproducibility
set.seed(123)
my_df <- structure(
list(
a = rep(c('A','B','C','D'),length.out = 100),
b = sample(c(0,1),100, replace = TRUE),
c = c(rep('COM1',20),rep('COM2',20),rep('COM3',20),rep('COM4',20), rep('COM5',20)),
d = sample(c(0,1),100, replace = TRUE)
),
.Names = c("METHODEKURZ", "CLASS_INT", "COMORB", "VALUES"),
row.names = c(NA, 100L),
class = "data.frame") %>%
mutate(across(c(contains('VALUES')),
~as.factor(.))) %>%
mutate(across(c(contains('CLASS_INT')),
~as.factor(.)))
Upvotes: 1