Reputation: 379
When using str_dectect() you can use the | operator like so...
example_df <- data.frame(
letters = c("A B C", "C B E", "F C B", "A D E", "F G C")
)
example_df %>% filter(str_detect(letters, "B|C"))
And it will return all rows except the fourth (where letters = "A D E").
I want to do the same with str_detect() but looking for a combination of letters.
I imagined you could just replace the | operator with the & operator and the following would return all rows except the last two.
example_df <- data.frame(
letters = c("A B C", "C B E", "F C B", "A D E", "F G C")
)
example_df %>% filter(str_detect(letters, "B&C"))
However, this doesn't work. Does anyone know how I can make this work using str_detect or another tidyverse method (I can get it to work with grepl but need to find a tidyverse solution).
Upvotes: 3
Views: 1564
Reputation: 88
You can use &
operator inside the filter
and add another str_detect
example_df %>%
filter(str_detect(letters, "B")&
str_detect(letters, "C"))
Maybe you don't want to repeat the str_detect
each time you need another match in the same column. Especially when you have many patterns. You can make a function to handle this situation
example_df |>
str_detect_and(letters, c('B','C'))
# letters
# 1 A B C
# 2 C B E
# 3 F C B
Here's the definition of str_detect_and
str_detect_and <- function(df, col_name, terms){
col_name <- enquo(col_name)
df <- Reduce(
function(dfRes, term)
filter(dfRes, str_detect(!!col_name, term)),
x = terms[-1],
init = filter(df, str_detect(!!col_name, terms[1]))
)
return(df)
}
Essentially, str_detect_and(example_df, letters, c('B','C'))
is the same as filter(filter(example_df, str_detect(letters, 'B')), str_detect(letters, 'C'))
, and if you need it, you can pass the snippet into another filter(str_detect)
with the same letters
column. That's why we can use the Reduce
function.
Upvotes: 0
Reputation: 44997
You can do it using Perl-style "non-consuming lookahead":
example_df <- data.frame(
letters = c("A B C", "C B E", "F C B", "A D E", "F G C", "B B E")
)
library(tidyverse)
example_df %>% filter(str_detect(letters, "(?=.*B)(?=.*C)"))
#> letters
#> 1 A B C
#> 2 C B E
#> 3 F C B
Created on 2022-03-23 by the reprex package (v2.0.1)
This looks for anything followed by B, but doesn't advance; then it looks for anything followed by C. That's accepted by default in str_detect
, but if you wanted to do the same sort of thing in base R functions, you'd need the perl = TRUE
option, e.g.
grep("(?=.*B)(?=.*C)", example_df$letters, perl = TRUE, value = TRUE)
Upvotes: 2