Reputation: 3737
I'm learning R and looking for best practices here...
Given the tibble my_tibble
:
# A tibble: 5 x 3
chkA chkB chkC
<chr> <chr> <chr>
1 NA NA NA
2 x NA NA
3 NA x NA
4 x NA NA
5 NA NA x
I want to create a variable/column checked
that specifies which of the variables chkA
, chkB
or chkC
that equals "x"
. For each observation/row, only zero or one of those three variables can equal "x"
, while the rest of them are NA
.
I can solve it with this code:
my_tibble <- my_tibble %>% mutate(checked = case_when(
chkA == "x" ~ "A",
chkB == "x" ~ "B",
chkC == "x" ~ "C",
TRUE ~ "(none)"
))
, which produces this tibble:
# A tibble: 5 x 4
chkA chkB chkC checked
<chr> <chr> <chr> <chr>
1 NA NA NA (none)
2 x NA NA A
3 NA x NA B
4 x NA NA A
5 NA NA x C
However, I assume/hope there could be a more convenient/elegant solution, perhaps a one-liner or something, as I think this is a frequent problem.
For a "cleaner" tibble, I'd probably like to get rid of the chr variables and their NA
s, by converting to lgl variables. I solved that with this code:
my_tibble <- my_tibble %>% mutate(chkA = ifelse(is.na(chkA), FALSE, TRUE))
my_tibble <- my_tibble %>% mutate(chkB = ifelse(is.na(chkB), FALSE, TRUE))
my_tibble <- my_tibble %>% mutate(chkC = ifelse(is.na(chkC), FALSE, TRUE))
, creating this tibble:
# A tibble: 5 x 3
chkA chkB chkC
<lgl> <lgl> <lgl>
1 FALSE FALSE FALSE
2 TRUE FALSE FALSE
3 FALSE TRUE FALSE
4 TRUE FALSE FALSE
5 FALSE FALSE TRUE
Is there a better way?
Upvotes: 2
Views: 98
Reputation: 887213
Here can use across
with mutate
for multiple column. Also, the output of is.na
is logical and it can be negated (!
) to return the opposite instead of using ifelse
or case_when
library(dplyr)
my_tibble %>%
select(starts_with('chk')) %>%
mutate(across(everything(), ~!is.na(.)))
# A tibble: 5 x 3
# chkA chkB chkC
# <lgl> <lgl> <lgl>
#1 FALSE FALSE FALSE
#2 TRUE FALSE FALSE
#3 FALSE TRUE FALSE
#4 TRUE FALSE FALSE
#5 FALSE FALSE TRUE
Or without anonymous function call
library(purrr)
my_tibble %>%
select(starts_with('chk')) %>%
mutate(across(everything(), negate(is.na)))
# A tibble: 5 x 3
# chkA chkB chkC
# <lgl> <lgl> <lgl>
#1 FALSE FALSE FALSE
#2 TRUE FALSE FALSE
#3 FALSE TRUE FALSE
#4 TRUE FALSE FALSE
#5 FALSE FALSE TRUE
For creating the column name based on the occurrence of NA
, a vectorized option is with max.col
nm1 <- sub('chk', '', names(my_tibble))[max.col(!is.na(my_tibble), 'first')]
nm1[!rowSums(!is.na(my_tibble))] <- NA_character_
my_tibble$checks <- nm1
my_tibble
# A tibble: 5 x 4
# chkA chkB chkC checks
# <chr> <chr> <chr> <chr>
#1 <NA> <NA> <NA> <NA>
#2 x <NA> <NA> A
#3 <NA> x <NA> B
#4 x <NA> <NA> A
#5 <NA> <NA> x C
An option in tidyverse
to create the column would be to first rename
the columns by removing the 'chk' prefix, then use imap
to loop over the columns, replace the non-NA elements with column name and use coalesce
to return the first non-NA value, create the 'checked' column in 'my_tibble'
library(stringr)
my_tibble <- my_tibble %>%
rename_all(~ str_remove(., 'chk')) %>%
imap_dfc(~ case_when(!is.na(.x) ~ .y)) %>%
invoke(coalesce, .) %>%
mutate(my_tibble, checked = .)
-output
my_tibble
# A tibble: 5 x 4
# chkA chkB chkC checked
# <chr> <chr> <chr> <chr>
#1 <NA> <NA> <NA> <NA>
#2 x <NA> <NA> A
#3 <NA> x <NA> B
#4 x <NA> <NA> A
#5 <NA> <NA> x C
my_tibble <- structure(list(chkA = c(NA, "x", NA, "x", NA), chkB = c(NA, NA,
"x", NA, NA), chkC = c(NA, NA, NA, NA, "x")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 1
Reputation: 39858
One dplyr
option for finding the column with x could be:
df %>%
rowwise() %>%
mutate(checked = names(.)[replace(which(c_across(everything()) == "x"), length(.) == 0, NA)])
chkA chkB chkC checked
<chr> <chr> <chr> <chr>
1 <NA> <NA> <NA> <NA>
2 x <NA> <NA> chkA
3 <NA> x <NA> chkB
4 x <NA> <NA> chkA
5 <NA> <NA> x chkC
Upvotes: 1