user6571411
user6571411

Reputation: 2979

Filter by multiple patterns with filter() and str_detect()

I would like to filter a dataframe using filter() and str_detect() matching for multiple patterns without multiple str_detect() function calls. In the example below I would like to filter the dataframe df to show only rows containing the letters a f and o.

df <- data.frame(numbers = 1:52, letters = letters)
df %>%
    filter(
        str_detect(.$letters, "a")|
        str_detect(.$letters, "f")| 
        str_detect(.$letters, "o")
    )
#  numbers letters
#1       1       a
#2       6       f
#3      15       o
#4      27       a
#5      32       f
#6      41       o

I have attempted the following

df %>%
    filter(
        str_detect(.$letters, c("a", "f", "o"))
     )
#  numbers letters
#1       1       a
#2      15       o
#3      32       f

and receive the following error

Warning message: In stri_detect_regex(string, pattern, opts_regex = opts(pattern)) : longer object length is not a multiple of shorter object length

Upvotes: 16

Views: 43841

Answers (3)

saQuist
saQuist

Reputation: 447

To synthesize the accepted answer even further, one could also define a vector with search patterns of interest and concatenate those with paste using its collapse argument where the search criterion 'or' is defined as '|' and the search criterion 'and' as '&'.

This could be useful, for example, when the search patterns are automatically generated somewhere else in the script or read from a source.

#' Changing the column name of the letters column to `lttrs`
#' to avoid confusion with the built-in vector `letters`
df <- data.frame(numbers = 1:52, lttrs = letters)

search_vec <- c('a','f','o')
df %>% 
    filter(str_detect(lttrs, pattern = paste(search_vec, collapse = '|')))

#  numbers letters
#1       1       a
#2       6       f
#3      15       o
#4      27       a
#5      32       f
#6      41       o

Upvotes: 1

Nana
Nana

Reputation: 13

Is this possible with an "&" rather an "|" (sorry dont have enough rep for comment)

Upvotes: 0

user6571411
user6571411

Reputation: 2979

The correct syntax to accomplish this with filter() and str_detect() would be

df %>%
  filter(
      str_detect(letters, "a|f|o")
  )
#  numbers letters
#1       1       a
#2       6       f
#3      15       o
#4      27       a
#5      32       f
#6      41       o

Upvotes: 51

Related Questions