Tom
Tom

Reputation: 379

How can I use str_detect() in combination with the & operator?

When using str_dectect() you can use the | operator like so...

example_df <- data.frame(
   letters = c("A B C", "C B E", "F C B", "A D E", "F G C")
)

example_df %>% filter(str_detect(letters, "B|C"))

And it will return all rows except the fourth (where letters = "A D E").

I want to do the same with str_detect() but looking for a combination of letters.

I imagined you could just replace the | operator with the & operator and the following would return all rows except the last two.

example_df <- data.frame(
   letters = c("A B C", "C B E", "F C B", "A D E", "F G C")
)

example_df %>% filter(str_detect(letters, "B&C"))

However, this doesn't work. Does anyone know how I can make this work using str_detect or another tidyverse method (I can get it to work with grepl but need to find a tidyverse solution).

Upvotes: 3

Views: 1564

Answers (2)

juarpasi
juarpasi

Reputation: 88

You can use & operator inside the filter and add another str_detect

example_df %>%
  filter(str_detect(letters, "B")&
           str_detect(letters, "C"))

Maybe you don't want to repeat the str_detect each time you need another match in the same column. Especially when you have many patterns. You can make a function to handle this situation

example_df  |>
  str_detect_and(letters, c('B','C'))
#   letters
# 1   A B C
# 2   C B E
# 3   F C B

Here's the definition of str_detect_and

str_detect_and <- function(df, col_name, terms){
  
  col_name <- enquo(col_name)
  
  df <- Reduce(
    function(dfRes, term)
    filter(dfRes, str_detect(!!col_name, term)),
    x = terms[-1],
    init = filter(df, str_detect(!!col_name, terms[1]))
    )
  
  return(df)
}

Essentially, str_detect_and(example_df, letters, c('B','C')) is the same as filter(filter(example_df, str_detect(letters, 'B')), str_detect(letters, 'C')), and if you need it, you can pass the snippet into another filter(str_detect) with the same letters column. That's why we can use the Reduce function.

Upvotes: 0

user2554330
user2554330

Reputation: 44997

You can do it using Perl-style "non-consuming lookahead":

example_df <- data.frame(
  letters = c("A B C", "C B E", "F C B", "A D E", "F G C", "B B E")
)

library(tidyverse)

example_df %>% filter(str_detect(letters, "(?=.*B)(?=.*C)"))
#>   letters
#> 1   A B C
#> 2   C B E
#> 3   F C B

Created on 2022-03-23 by the reprex package (v2.0.1)

This looks for anything followed by B, but doesn't advance; then it looks for anything followed by C. That's accepted by default in str_detect, but if you wanted to do the same sort of thing in base R functions, you'd need the perl = TRUE option, e.g.

grep("(?=.*B)(?=.*C)", example_df$letters, perl = TRUE, value = TRUE)

Upvotes: 2

Related Questions