Reputation: 1057
I'm trying to clean up a sample information sheet that comes from a lot of different groups and thus the treatment information I care about may be located in any number of different columns. Here's an abstracted example:
sample_info = tribble(
~id, ~could_be_here, ~or_here, ~or_even_in_this_one,
1, NA, "not_me", "find_me_other_stuff",
2, "Extra_Find_Me", NA, "diff_stuff",
3, NA, "Find_me", NA,
4, NA, "not_here", "not_here_either"
)
where I would want to find "find_me" 1) case-insensitively, 2) where it could be in any column, and 3) where it could be as part of a larger string. I want to create one column that's TRUE or FALSE for whether "find_me" was found in any columns. How can I do this? (I've thought of unite
ing all columns and then just running a str_detect
on that mess, but there must be a less hacky way, right?)
To be clear, I would want a final tibble that's equivalent to sample_info %>% mutate(find_me = c(TRUE, TRUE, TRUE, FALSE))
.
I expect that I would want to use something like stringr::str_detect(., regex('find_me', ignore_case = T))
and pmap_lgl(any(c(...) <insert logic check>))
like in the similar cases linked below, but I'm not sure how to put them together into a mutate-compatible statement.
Things I've looked through:
Row-wise operation to see if any columns are in any other list
R: How to ignore case when using str_detect?
in R, check if string appears in row of dataframe (in any column)
Upvotes: 4
Views: 1321
Reputation: 9878
This is the typical use case for dplyr::if_any
.
if_any
of the selected columns has a match, the new columns outputs to TRUE. Use regex()
with the argument ignore_case = TRUE
for a case-insensitive match.
library(dplyr)
library(stringr)
sample_info |>
mutate(find_me = if_any(-id,\(x) str_detect(x, regex("find_me", ignore_case = TRUE))))
# A tibble: 4 × 5
id could_be_here or_here or_even_in_this_one find_me
<dbl> <chr> <chr> <chr> <lgl>
1 1 NA not_me find_me_other_stuff TRUE
2 2 Extra_Find_Me NA diff_stuff TRUE
3 3 NA Find_me NA TRUE
4 4 NA not_here not_here_either NA
Upvotes: 2
Reputation: 685
In case you did want to try the hacky way, your idea of using unite
does actually work:
sample_info %>% unite(new, remove = FALSE) %>%
mutate(found = str_detect(.$new, regex("find_me", ignore_case = TRUE))) %>%
select(-new)
Upvotes: 2
Reputation: 21938
I hope I got what you have in mind right. This is how I find all find_me
s across multiple columns:
library(dplyr)
library(purrr)
library(stringr)
sample_info = tribble(
~id, ~could_be_here, ~or_here, ~or_even_in_this_one,
1, NA, "not_me", "find_me_other_stuff",
2, "Extra_Find_Me", NA, "diff_stuff",
3, NA, "Find_me", NA,
4, NA, "not_here", "not_here_either"
)
sample_info %>%
mutate(find_me_exist = if_any(, ~ str_detect(., regex("find_me", ignore_case = TRUE), )
, .names = "{.col}.fn{.fn}"))
# A tibble: 4 x 5
id could_be_here or_here or_even_in_this_one find_me_exist
<dbl> <chr> <chr> <chr> <lgl>
1 1 NA not_me find_me TRUE
2 2 Extra_Find_me NA diff_stuff TRUE
3 3 NA find_Me NA TRUE
4 4 NA not_here not_here_either FALSE
Sorry I had to edit my code so that it is not case sensitive.
Upvotes: 3
Reputation: 40151
One dplyr
and purrr
option could be:
sample_info %>%
mutate(find_me = pmap_lgl(across(-id), ~ any(str_detect(c(...), regex("find_me", ignore_case = TRUE)), na.rm = TRUE)))
id could_be_here or_here or_even_in_this_one find_me
<dbl> <chr> <chr> <chr> <lgl>
1 1 <NA> not_me find_me_other_stuff TRUE
2 2 Extra_Find_Me <NA> diff_stuff TRUE
3 3 <NA> Find_me <NA> TRUE
4 4 <NA> not_here not_here_either FALSE
Or with just using dplyr
:
sample_info %>%
rowwise() %>%
mutate(find_me = any(str_detect(c_across(-id), regex("find_me", ignore_case = TRUE)), na.rm = TRUE))
Upvotes: 5