timnus
timnus

Reputation: 197

How can I filter a vector based on regex list in R?

I have a character string vector that I would like to filter based on keywords from a second vector.

Below is a small reprex:

list1 <- c("I like apples", "I eat bread", "Bananas are my favorite")
fruit <- c("apple","banana")

I am presuming I will be needing to use stringr/stringi, but I would, in essence, like to do something alongs the lines of list1 %in% fruit and it return T,F,T.

Any suggestions?

Upvotes: 0

Views: 398

Answers (2)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

A solution with str_dectect:

library(tidyverse)
data.frame(list1) %>%
  mutate(Flag = str_detect(list1, paste0("(?i)", paste0(fruit, collapse = "|"))))
                    list1  Flag
1           I like apples  TRUE
2             I eat bread FALSE
3 Bananas are my favorite  TRUE

If you want to filter(i.e. subset) your data:

data.frame(list1) %>%
  filter(str_detect(list1, paste0("(?i)", paste0(fruit, collapse = "|"))))
                    list1
1           I like apples
2 Bananas are my favorite

Note that (?i) is used to make the match case-insensitive.

EDIT:

To record the matches in a separate column you can use str_extract(if you expect to have just one match per string) or str_extract_all(for more than one matches):

data.frame(list1) %>%
  mutate(Flag = str_detect(list1, paste0("(?i)", paste0(fruit, collapse = "|"))),
         Match = str_extract_all(list1, paste0("(?i)", paste0(fruit, collapse = "|"))))
                    list1  Flag  Match
1           I like apples  TRUE  apple
2             I eat bread FALSE       
3 Bananas are my favorite  TRUE Banana

Upvotes: 2

benson23
benson23

Reputation: 19097

We can do this with grepl without using external packages.

grepl can handle multiple patterns separated by |, therefore we can first concatenate the strings in fruit together with | as the separator.

Remember to set ignore.case = TRUE if you don't care about case (note the "banana" in your example has different case).

grepl(paste(fruit, collapse = "|"), list1, ignore.case = T)
[1]  TRUE FALSE  TRUE

Or use grep to directly output the string that match:

# same as list1[grepl(paste(fruit, collapse = "|"), list1, ignore.case = T)]
grep(paste(fruit, collapse = "|"), list1, ignore.case = T, value = T)
[1] "I like apples"           "Bananas are my favorite"

Upvotes: 2

Related Questions