Karen Liu
Karen Liu

Reputation: 115

Output all column names where the value matches a specific number in R

I have a data frame about whether a patient meets the study criteria, and each row is a patient, each column is a criterion. So some columns are inclusion criteria and some columns are exclusion criteria, and I want to output the reasons for ineligibility. For example,

test <- data.frame(A = c(0, 0, 1), 
                   B = c(0, 0, 0), 
                   C = c(0, 1, 1), 
                   D = c(1, 0, 0), 
                   E = c(1, 0, 1))

where A, B, C are inclusion criteria and D, E are exclusion criteria, and I want to output the column names (could be more than one) if the inclusion criteria == 0 or exclusion criteria == 1.

The expected output would be

output <- data.frame(A = c(0, 0, 1), 
                       B = c(0, 0, 0), 
                       C = c(0, 1, 1), 
                       D = c(1, 0, 0), 
                       E = c(1, 0, 1),
                       failed_incl = c("A, B, C", "A, B", "B"),
                       failed_excl = c("D, E", "", "E"))

Is there a way to do it efficiently without having to write out every possible scenario? The actual data frame has much more columns.

Upvotes: 1

Views: 632

Answers (2)

akrun
akrun

Reputation: 887068

There are multiply ways. An option is to use apply to loop over the rows (MARGIN = 1), get the names of the logical vector (x== 0) and paste them together

test$failed_incl <- apply(test[1:3], 1, function(x) toString(names(x)[x == 0]))
test$failed_excl <- apply(test[4:5], 1, function(x) toString(names(x)[x == 1]))

-output

test
#  A B C D E failed_incl failed_excl
#1 0 0 0 1 1     A, B, C        D, E
#2 0 0 1 0 0        A, B            
#3 1 0 1 0 1           B           E

Or using tidyverse

library(dplyr)
test %>%
    rowwise %>% 
    mutate(failed_incl = toString(names(.)[which(c_across(A:C) == 0)]),
           failed_excl = toString(c('D', 'E')[which(c_across(D:E) == 1)])) %>% 
    ungroup
# A tibble: 3 x 7
#      A     B     C     D     E failed_incl failed_excl
#  <dbl> <dbl> <dbl> <dbl> <dbl> <chr>       <chr>      
#1     0     0     0     1     1 A, B, C     "D, E"     
#2     0     0     1     0     0 A, B        ""         
#3     1     0     1     0     1 B           "E"    

Upvotes: 1

andrew_reece
andrew_reece

Reputation: 21264

Here's a tidyverse approach, which pivots the column names and then summarizes based on the inclusion/exclusion conditions:

failed_df <-
  test %>%
  add_rownames() %>%
  pivot_longer(-rowname) %>%
  group_by(rowname) %>%
  summarise(failed_incl = paste(name[value == 0 & name %in% c("A", "B", "C")], collapse = ", "),
            failed_excl = paste(name[value == 1 & name %in% c("D", "E")], collapse = ", ")) %>%
  select(-rowname)

bind_cols(test, failed_df)

  A B C D E failed_incl failed_excl
1 0 0 0 1 1     A, B, C        D, E
2 0 0 1 0 0        A, B            
3 1 0 1 0 1           B           E

There may be a more elegant way to do this with rowwise and c_across.

Upvotes: 0

Related Questions