Brian
Brian

Reputation: 195

Apply confusionMatrix() to Elements of a Split List in R

I'm searching for a solution to apply the confusionMatrix() function from {caret} to specific elements of a split list. I have 3 Groups, with each group having 10 observations of Actuals and 3 Preds columns.

library(caret)
set.seed(10)
dat <- data.frame(Group = c(rep(1, 10), rep(2, 10), rep(3, 10)), Actual = round(runif(30, 0, 1)),
              Preds1 = round(runif(30, 0, 1)), Preds2 = round(runif(30, 0, 1)), Preds3 = round(runif(30, 0, 1)))

> dat
   Group Actual Preds1 Preds2 Preds3
1      1      1      1      0      0
2      1      0      0      0      1
3      1      0      0      0      1
4      1      1      1      0      1
...........
27     3      1      0      1      0
28     3      0      0      0      1
29     3      1      0      0      1
30     3      0      1      0      1

The final solution should create confusion matrices by Group, by each Preds column. I will need the actual confusion matrix tables, but will eventually need to extract the $overall and $byClass elements and end up with something like below.

> conf_matrix
$Preds1
      Accuracy  Sensitivity  Specificity
 [1,] 0.73      0.8          0.6            
 [2,] 0.93      0.91         1              
 [3,] 0.87      0.83         1              
 [4,] 0.8       0.82         0.75
...............
[27,] 0.8       0.82         0.75           
[28,] 0.58      0.67         0.5            
[29,] 1         0.67         1              
[30,] 1         0            1

$Preds2
      Accuracy  Sensitivity  Specificity
 [1,] 0.73      0.8          0.6            
 [2,] 0.93      0.91         1              
 [3,] 0.87      0.83         1              
 [4,] 0.8       0.82         0.75    
...............
[27,] 0.8       0.82         0.75           
[28,] 0.58      0.67         0.5            
[29,] 1         0.67         1              
[30,] 1         0            1

$Preds3
...............

I have tried the script below, but keeping running into issues when trying the secondary indexing by the Preds column within each group. I believe it has something to do with my nested lapply's and how I'm indexing since this works when I decompose the code and step through it one at a time.

I have also tried to do this manually using table(), however have abandoned that method because it does not give me consistent results like using confusionMatrix().

lapply(seq_along(split(dat[3:5], list(dat$Group))), function(x) {
    x_temp <- split(dat[3:5], list(dat$Group))[[x]]
    lapply(seq_along(x_temp), function(x2) {
        x_temp <- x_temp[[x2]]
        lapply(seq_along(split(dat[2], list(dat$Group))), function(y) {
            y_temp <- split(dat[2], list(dat$Group))[[y]]
            lapply(seq_along(y_temp), function(y2) {
                y_temp <- y_temp[[y2]]
                confusionMatrix(x_temp, y_temp)
            })
        })
    })
})

I may be way off base so I'm open to all suggestions and comments.

Upvotes: 2

Views: 1096

Answers (1)

Jaehyeon Kim
Jaehyeon Kim

Reputation: 1417

I don't understand the final outcome but confusion matrices would be obtained by the following.

library(caret)
set.seed(10)
dat <- data.frame(Group = c(rep(1, 10), rep(2, 10), rep(3, 10)), Actual = round(runif(30, 0, 1)),
                  Preds1 = round(runif(30, 0, 1)), Preds2 = round(runif(30, 0, 1)), Preds3 = round(runif(30, 0, 1)))
dat[] <- lapply(dat, as.factor)

# split by group
dats <- split(dat[,-1], dat$Group)

cm <- do.call(c, lapply(dats, function(x) {
  actual <- x[, 1]
  lapply(x[, 2:4], function(y) {
    confusionMatrix(actual, unlist(y))$table
  })
}))
cm[1:3]
$`1.Preds1`
Reference
Prediction 0 1
0 3 4
1 0 3

$`1.Preds2`
Reference
Prediction 0 1
0 4 3
1 3 0

$`1.Preds3`
Reference
Prediction 0 1
0 3 4
1 1 2

@ Brian

In the link (What's the difference between lapply and do.call in R?), I find Paul Hiemstra's answer quite straightforward.

-lapply is similar to map, do.call is not. lapply applies a function to all elements of a list, do.call calls a function where all the function arguments are in a list. So for a n element list, lapply has n function calls, and do.call has just one function call. So do.call is quite different from lapply.

In the example,

dats has three elements - 1, 2 and 3

dats <- split(dat[,-1], dat$Group)
dats[1]
$`1`
Actual Preds1 Preds2 Preds3
1       1      1      0      0
2       0      0      0      1
3       0      0      0      1
4       1      1      0      1
5       0      0      1      0
6       0      1      1      1
7       0      1      1      0
8       0      1      0      1
9       1      1      0      1
10      0      1      0      0

Below is double loop and the first loop applied to 1, 2 and 3 and the second loop to Preds1, Preds2 and Preds3. Therefore the list generated by lapply() alone produces a nested list as shown below.

lapply(dats, function(x) {
  actual <- x[, 1]
  lapply(x[, 2:4], function(y) {
    confusionMatrix(actual, unlist(y))$table
  })
})[1]
$`1`
$`1`$Preds1
Reference
Prediction 0 1
0 3 4
1 0 3

$`1`$Preds2
Reference
Prediction 0 1
0 4 3
1 3 0

$`1`$Preds3
Reference
Prediction 0 1
0 3 4
1 1 2

However the above is not easy to use later as another double loop is necessary to have access to each confusion matrix. It is simplified with do.call(). The first argument c is a function and it does c(dats$1$Preds1, dats$1$Preds2, dats$1$Preds2 ...) so that the structure is reduced to be accessible by single loop. Normally I tend to use do.call() when it is necessary to change the structure of a list.

do.call(c, lapply(dats, function(x) {
  actual <- x[, 1]
  lapply(x[, 2:4], function(y) {
    confusionMatrix(actual, unlist(y))$table
  })
}))[1:3]
$`1.Preds1`
Reference
Prediction 0 1
0 3 4
1 0 3

$`1.Preds2`
Reference
Prediction 0 1
0 4 3
1 3 0

$`1.Preds3`
Reference
Prediction 0 1
0 3 4
1 1 2

Upvotes: 2

Related Questions