Reputation: 1057

Search for string across entire row of a tibble?

I'm trying to clean up a sample information sheet that comes from a lot of different groups and thus the treatment information I care about may be located in any number of different columns. Here's an abstracted example:

sample_info = tribble(
  ~id, ~could_be_here, ~or_here,    ~or_even_in_this_one,
  1,   NA,             "not_me",    "find_me_other_stuff",
  2,   "Extra_Find_Me", NA,         "diff_stuff",
  3,   NA,              "Find_me",  NA,
  4,   NA,              "not_here", "not_here_either"
)

where I would want to find "find_me" 1) case-insensitively, 2) where it could be in any column, and 3) where it could be as part of a larger string. I want to create one column that's TRUE or FALSE for whether "find_me" was found in any columns. How can I do this? (I've thought of uniteing all columns and then just running a str_detect on that mess, but there must be a less hacky way, right?)

To be clear, I would want a final tibble that's equivalent to sample_info %>% mutate(find_me = c(TRUE, TRUE, TRUE, FALSE)).

I expect that I would want to use something like stringr::str_detect(., regex('find_me', ignore_case = T)) and pmap_lgl(any(c(...) <insert logic check>)) like in the similar cases linked below, but I'm not sure how to put them together into a mutate-compatible statement.

Things I've looked through:
Row-wise operation to see if any columns are in any other list

R: How to ignore case when using str_detect?

in R, check if string appears in row of dataframe (in any column)

Upvotes: 4

Answers (4)

GuedesBF

Reputation: 9878

This is the typical use case for dplyr::if_any. if_any of the selected columns has a match, the new columns outputs to TRUE. Use regex() with the argument ignore_case = TRUE for a case-insensitive match.

library(dplyr)
library(stringr)

sample_info |> 
    mutate(find_me = if_any(-id,\(x) str_detect(x, regex("find_me", ignore_case = TRUE))))

# A tibble: 4 × 5
     id could_be_here or_here  or_even_in_this_one find_me
  <dbl> <chr>         <chr>    <chr>               <lgl>  
1     1 NA            not_me   find_me_other_stuff TRUE   
2     2 Extra_Find_Me NA       diff_stuff          TRUE   
3     3 NA            Find_me  NA                  TRUE   
4     4 NA            not_here not_here_either     NA

Upvotes: 2

awaji98

Reputation: 685

In case you did want to try the hacky way, your idea of using unite does actually work:

 sample_info %>% unite(new, remove = FALSE) %>% 
    mutate(found = str_detect(.$new, regex("find_me", ignore_case = TRUE))) %>% 
    select(-new)

Upvotes: 2

Anoushiravan R

Reputation: 21938

I hope I got what you have in mind right. This is how I find all find_mes across multiple columns:

library(dplyr)
library(purrr)
library(stringr)

sample_info = tribble(
  ~id, ~could_be_here, ~or_here,    ~or_even_in_this_one,
  1,   NA,             "not_me",    "find_me_other_stuff",
  2,   "Extra_Find_Me", NA,         "diff_stuff",
  3,   NA,              "Find_me",  NA,
  4,   NA,              "not_here", "not_here_either"
)

sample_info %>%
  mutate(find_me_exist = if_any(, ~ str_detect(., regex("find_me", ignore_case = TRUE), )
                                , .names = "{.col}.fn{.fn}"))

# A tibble: 4 x 5
     id could_be_here or_here  or_even_in_this_one find_me_exist
  <dbl> <chr>         <chr>    <chr>               <lgl>        
1     1 NA            not_me   find_me             TRUE         
2     2 Extra_Find_me NA       diff_stuff          TRUE         
3     3 NA            find_Me  NA                  TRUE         
4     4 NA            not_here not_here_either     FALSE

Sorry I had to edit my code so that it is not case sensitive.

Upvotes: 3

tmfmnk

Reputation: 40151

One dplyr and purrr option could be:

sample_info %>%
 mutate(find_me = pmap_lgl(across(-id), ~ any(str_detect(c(...), regex("find_me", ignore_case = TRUE)), na.rm = TRUE)))

     id could_be_here or_here  or_even_in_this_one find_me
  <dbl> <chr>         <chr>    <chr>               <lgl>  
1     1 <NA>          not_me   find_me_other_stuff TRUE   
2     2 Extra_Find_Me <NA>     diff_stuff          TRUE   
3     3 <NA>          Find_me  <NA>                TRUE   
4     4 <NA>          not_here not_here_either     FALSE

Or with just using dplyr:

sample_info %>%
 rowwise() %>%
 mutate(find_me = any(str_detect(c_across(-id), regex("find_me", ignore_case = TRUE)), na.rm = TRUE))

Upvotes: 5

Search for string across entire row of a tibble?

Answers (4)

Related Questions