Add one variable to specify which one of several variables that equals a value

Question

I'm learning R and looking for best practices here...

Main question

Given the tibble my_tibble:

# A tibble: 5 x 3
  chkA  chkB  chkC 
    
1 NA    NA    NA   
2 x     NA    NA   
3 NA    x     NA   
4 x     NA    NA   
5 NA    NA    x

I want to create a variable/column checked that specifies which of the variables chkA, chkB or chkC that equals "x". For each observation/row, only zero or one of those three variables can equal "x", while the rest of them are NA.

I can solve it with this code:

my_tibble <- my_tibble %>% mutate(checked = case_when(
  chkA == "x" ~ "A",
  chkB == "x" ~ "B",
  chkC == "x" ~ "C",
  TRUE ~ "(none)"
))

, which produces this tibble:

# A tibble: 5 x 4
  chkA  chkB  chkC  checked
       
1 NA    NA    NA    (none) 
2 x     NA    NA    A      
3 NA    x     NA    B      
4 x     NA    NA    A      
5 NA    NA    x     C

However, I assume/hope there could be a more convenient/elegant solution, perhaps a one-liner or something, as I think this is a frequent problem.

Bonus/follow-up question

For a "cleaner" tibble, I'd probably like to get rid of the chr variables and their NAs, by converting to lgl variables. I solved that with this code:

my_tibble <- my_tibble %>% mutate(chkA = ifelse(is.na(chkA), FALSE, TRUE))
my_tibble <- my_tibble %>% mutate(chkB = ifelse(is.na(chkB), FALSE, TRUE))
my_tibble <- my_tibble %>% mutate(chkC = ifelse(is.na(chkC), FALSE, TRUE))

, creating this tibble:

# A tibble: 5 x 3
  chkA  chkB  chkC 
    
1 FALSE FALSE FALSE
2 TRUE  FALSE FALSE
3 FALSE TRUE  FALSE
4 TRUE  FALSE FALSE
5 FALSE FALSE TRUE

Is there a better way?

akrun · Accepted Answer

Here can use across with mutate for multiple column. Also, the output of is.na is logical and it can be negated (!) to return the opposite instead of using ifelse or case_when

library(dplyr)
my_tibble %>%
     select(starts_with('chk')) %>%
     mutate(across(everything(),  ~!is.na(.)))
# A tibble: 5 x 3
#  chkA  chkB  chkC 
#    
#1 FALSE FALSE FALSE
#2 TRUE  FALSE FALSE
#3 FALSE TRUE  FALSE
#4 TRUE  FALSE FALSE
#5 FALSE FALSE TRUE

Or without anonymous function call

library(purrr)
my_tibble %>%
   select(starts_with('chk')) %>%
   mutate(across(everything(), negate(is.na)))
# A tibble: 5 x 3
#  chkA  chkB  chkC 
#    
#1 FALSE FALSE FALSE
#2 TRUE  FALSE FALSE
#3 FALSE TRUE  FALSE
#4 TRUE  FALSE FALSE
#5 FALSE FALSE TRUE

For creating the column name based on the occurrence of NA, a vectorized option is with max.col

nm1 <- sub('chk', '', names(my_tibble))[max.col(!is.na(my_tibble), 'first')]
nm1[!rowSums(!is.na(my_tibble))] <- NA_character_
my_tibble$checks <- nm1
my_tibble
# A tibble: 5 x 4
#  chkA  chkB  chkC  checks
#      
#1         
#2 x         A     
#3   x       B     
#4 x         A     
#5     x     C

An option in tidyverse to create the column would be to first rename the columns by removing the 'chk' prefix, then use imap to loop over the columns, replace the non-NA elements with column name and use coalesce to return the first non-NA value, create the 'checked' column in 'my_tibble'

library(stringr)
my_tibble <- my_tibble %>%
   rename_all(~ str_remove(., 'chk')) %>%
   imap_dfc(~  case_when(!is.na(.x) ~  .y)) %>%
   invoke(coalesce, .) %>% 
   mutate(my_tibble, checked = .)

-output

my_tibble
# A tibble: 5 x 4
#  chkA  chkB  chkC  checked
#       
#1          
#2 x         A      
#3   x       B      
#4 x         A      
#5     x     C

data

my_tibble <- structure(list(chkA = c(NA, "x", NA, "x", NA), chkB = c(NA, NA, 
"x", NA, NA), chkC = c(NA, NA, NA, NA, "x")), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

Add one variable to specify which one of several variables that equals a value

Main question

Bonus/follow-up question

Answers (2)

data

Related Questions