Reputation: 195
I'm searching for a solution to apply the confusionMatrix() function from {caret} to specific elements of a split list. I have 3 Groups, with each group having 10 observations of Actuals and 3 Preds columns.
library(caret)
set.seed(10)
dat <- data.frame(Group = c(rep(1, 10), rep(2, 10), rep(3, 10)), Actual = round(runif(30, 0, 1)),
Preds1 = round(runif(30, 0, 1)), Preds2 = round(runif(30, 0, 1)), Preds3 = round(runif(30, 0, 1)))
> dat
Group Actual Preds1 Preds2 Preds3
1 1 1 1 0 0
2 1 0 0 0 1
3 1 0 0 0 1
4 1 1 1 0 1
...........
27 3 1 0 1 0
28 3 0 0 0 1
29 3 1 0 0 1
30 3 0 1 0 1
The final solution should create confusion matrices by Group, by each Preds column. I will need the actual confusion matrix tables, but will eventually need to extract the $overall and $byClass elements and end up with something like below.
> conf_matrix
$Preds1
Accuracy Sensitivity Specificity
[1,] 0.73 0.8 0.6
[2,] 0.93 0.91 1
[3,] 0.87 0.83 1
[4,] 0.8 0.82 0.75
...............
[27,] 0.8 0.82 0.75
[28,] 0.58 0.67 0.5
[29,] 1 0.67 1
[30,] 1 0 1
$Preds2
Accuracy Sensitivity Specificity
[1,] 0.73 0.8 0.6
[2,] 0.93 0.91 1
[3,] 0.87 0.83 1
[4,] 0.8 0.82 0.75
...............
[27,] 0.8 0.82 0.75
[28,] 0.58 0.67 0.5
[29,] 1 0.67 1
[30,] 1 0 1
$Preds3
...............
I have tried the script below, but keeping running into issues when trying the secondary indexing by the Preds column within each group. I believe it has something to do with my nested lapply's and how I'm indexing since this works when I decompose the code and step through it one at a time.
I have also tried to do this manually using table(), however have abandoned that method because it does not give me consistent results like using confusionMatrix().
lapply(seq_along(split(dat[3:5], list(dat$Group))), function(x) {
x_temp <- split(dat[3:5], list(dat$Group))[[x]]
lapply(seq_along(x_temp), function(x2) {
x_temp <- x_temp[[x2]]
lapply(seq_along(split(dat[2], list(dat$Group))), function(y) {
y_temp <- split(dat[2], list(dat$Group))[[y]]
lapply(seq_along(y_temp), function(y2) {
y_temp <- y_temp[[y2]]
confusionMatrix(x_temp, y_temp)
})
})
})
})
I may be way off base so I'm open to all suggestions and comments.
Upvotes: 2
Views: 1096
Reputation: 1417
I don't understand the final outcome but confusion matrices would be obtained by the following.
library(caret)
set.seed(10)
dat <- data.frame(Group = c(rep(1, 10), rep(2, 10), rep(3, 10)), Actual = round(runif(30, 0, 1)),
Preds1 = round(runif(30, 0, 1)), Preds2 = round(runif(30, 0, 1)), Preds3 = round(runif(30, 0, 1)))
dat[] <- lapply(dat, as.factor)
# split by group
dats <- split(dat[,-1], dat$Group)
cm <- do.call(c, lapply(dats, function(x) {
actual <- x[, 1]
lapply(x[, 2:4], function(y) {
confusionMatrix(actual, unlist(y))$table
})
}))
cm[1:3]
$`1.Preds1`
Reference
Prediction 0 1
0 3 4
1 0 3
$`1.Preds2`
Reference
Prediction 0 1
0 4 3
1 3 0
$`1.Preds3`
Reference
Prediction 0 1
0 3 4
1 1 2
@ Brian
In the link (What's the difference between lapply and do.call in R?), I find Paul Hiemstra's answer quite straightforward.
-lapply
is similar to map
, do.call
is not. lapply
applies a function to all elements of a list, do.call
calls a function where all the function arguments are in a list. So for a n
element list, lapply
has n
function calls, and do.call
has just one
function call. So do.call
is quite different from lapply
.
In the example,
dats
has three elements - 1
, 2
and 3
dats <- split(dat[,-1], dat$Group)
dats[1]
$`1`
Actual Preds1 Preds2 Preds3
1 1 1 0 0
2 0 0 0 1
3 0 0 0 1
4 1 1 0 1
5 0 0 1 0
6 0 1 1 1
7 0 1 1 0
8 0 1 0 1
9 1 1 0 1
10 0 1 0 0
Below is double loop and the first loop applied to 1
, 2
and 3
and the second loop to Preds1, Preds2 and Preds3. Therefore the list generated by lapply()
alone produces a nested list as shown below.
lapply(dats, function(x) {
actual <- x[, 1]
lapply(x[, 2:4], function(y) {
confusionMatrix(actual, unlist(y))$table
})
})[1]
$`1`
$`1`$Preds1
Reference
Prediction 0 1
0 3 4
1 0 3
$`1`$Preds2
Reference
Prediction 0 1
0 4 3
1 3 0
$`1`$Preds3
Reference
Prediction 0 1
0 3 4
1 1 2
However the above is not easy to use later as another double loop is necessary to have access to each confusion matrix. It is simplified with do.call()
. The first argument c
is a function and it does c(dats$
1$Preds1, dats$
1$Preds2, dats$
1$Preds2 ...)
so that the structure is reduced to be accessible by single loop. Normally I tend to use do.call()
when it is necessary to change the structure of a list.
do.call(c, lapply(dats, function(x) {
actual <- x[, 1]
lapply(x[, 2:4], function(y) {
confusionMatrix(actual, unlist(y))$table
})
}))[1:3]
$`1.Preds1`
Reference
Prediction 0 1
0 3 4
1 0 3
$`1.Preds2`
Reference
Prediction 0 1
0 4 3
1 3 0
$`1.Preds3`
Reference
Prediction 0 1
0 3 4
1 1 2
Upvotes: 2