Output all column names where the value matches a specific number in R

Question

I have a data frame about whether a patient meets the study criteria, and each row is a patient, each column is a criterion. So some columns are inclusion criteria and some columns are exclusion criteria, and I want to output the reasons for ineligibility. For example,

test <- data.frame(A = c(0, 0, 1), 
                   B = c(0, 0, 0), 
                   C = c(0, 1, 1), 
                   D = c(1, 0, 0), 
                   E = c(1, 0, 1))

where A, B, C are inclusion criteria and D, E are exclusion criteria, and I want to output the column names (could be more than one) if the inclusion criteria == 0 or exclusion criteria == 1.

The expected output would be

output <- data.frame(A = c(0, 0, 1), 
                       B = c(0, 0, 0), 
                       C = c(0, 1, 1), 
                       D = c(1, 0, 0), 
                       E = c(1, 0, 1),
                       failed_incl = c("A, B, C", "A, B", "B"),
                       failed_excl = c("D, E", "", "E"))

Is there a way to do it efficiently without having to write out every possible scenario? The actual data frame has much more columns.

akrun · Accepted Answer

There are multiply ways. An option is to use apply to loop over the rows (MARGIN = 1), get the names of the logical vector (x== 0) and paste them together

test$failed_incl <- apply(test[1:3], 1, function(x) toString(names(x)[x == 0]))
test$failed_excl <- apply(test[4:5], 1, function(x) toString(names(x)[x == 1]))

-output

test
#  A B C D E failed_incl failed_excl
#1 0 0 0 1 1     A, B, C        D, E
#2 0 0 1 0 0        A, B            
#3 1 0 1 0 1           B           E

Or using tidyverse

library(dplyr)
test %>%
    rowwise %>% 
    mutate(failed_incl = toString(names(.)[which(c_across(A:C) == 0)]),
           failed_excl = toString(c('D', 'E')[which(c_across(D:E) == 1)])) %>% 
    ungroup
# A tibble: 3 x 7
#      A     B     C     D     E failed_incl failed_excl
#                    
#1     0     0     0     1     1 A, B, C     "D, E"     
#2     0     0     1     0     0 A, B        ""         
#3     1     0     1     0     1 B           "E"

Output all column names where the value matches a specific number in R

Answers (2)

Related Questions