Howard Baik
Howard Baik

Reputation: 308

Regular expression to search for whole phrases and single words at the same time in str_detect()

I'm confused about regular expressions in str_detect(). Here is the setup:

library(stringr)

vector_to_check <- c("overall survival (os) in melanoma participants (parts b plus d)", "median overall survival (os)", "one- and two-year overall survival rate will be determined.", "overall survival rate (os rate) at month 6 in all participants")

str_detect(vector_to_check, "rate") 
# [1] FALSE FALSE  TRUE  TRUE

str_detect(vector_to_check , "overall survival (os) in melanoma participants (parts b plus d)")
# [1] FALSE FALSE FALSE FALSE

Basically, I want to input two types of pattern in str_detect(string, pattern):

  1. Single words like "rate", "median", etc
  2. Whole phrases, like "overall survival (os) in melanoma participants (parts b plus d)"

Is there a regular expression (pattern) that allows for this?

Thank you

Upvotes: 1

Views: 45

Answers (2)

akrun
akrun

Reputation: 886948

Wrap with fixed as there are metacharacters (()) in it, which may need to be either escaped (\\) otherwise

library(stringr)
str_detect(vector_to_check , fixed("overall survival (os) in melanoma participants (parts b plus d)"))
[1]  TRUE FALSE FALSE FALSE
str_detect(vector_to_check, fixed("rate")) 
[1] FALSE FALSE  TRUE  TRUE

If we need to combine both,

library(purrr)
map(c("rate", "overall survival (os) in melanoma participants (parts b plus d)"), 
  ~ str_detect(vector_to_check, fixed(.x))) %>%
    reduce(`|`)

-output

[1]  TRUE FALSE  TRUE  TRUE

Upvotes: 2

bdbmax
bdbmax

Reputation: 341

You can use the collapse argument of paste0() to add an 'or' to any amount of patterns, like this:

library(stringr)

vector_to_check <- c("overall survival (os) in melanoma participants (parts b plus d)", "median overall survival (os)", "one- and two-year overall survival rate will be determined.", "overall survival rate (os rate) at month 6 in all participants")

patterns <- c("rate", 
              "overall survival \\(os\\) in melanoma participants \\(parts b plus d\\)")

str_detect(vector_to_check, paste0(patterns, collapse = "|"))
[1]  TRUE FALSE  TRUE  TRUE

I also added the \\ in front of the parenthesis, as in regular expression you'll need to escape them!

Upvotes: 2

Related Questions