Reputation: 1043
I have two dataframes: msnbc
contains a column of news transcripts called text
and dictionary
contains a column of words called search
. I want to return a new dataframe that includes all rows of msnbc
where the text
field contains one or more words from the search
column. Toy data:
msnbc <- data.frame(id=c(1,2,3), text=c("hello world", "goodbye world","hello friends"))
dictionary <- data.frame(search=c("hello","lorem","ipsum","dolor")
The new dataset should include the first and third element of msnbc
because they include one of the words from dictionary$search
My first thought was to use str_detect
but there is no option for passing a vector of strings as the pattern. My other idea was to use filter
somehow but not sure how to implement:
new_msnbc <- msnbc %>%
filter(dictionary$search %in% text)
But this doesn't work as intended. What is the best way to do this? Bonus points for a tidyverse
solution.
Upvotes: 1
Views: 971
Reputation: 1043
It appears you can do this with filter
and grepl
:
result <- msnbc %>%
filter(grepl(paste(dictionary$search, collapse="|"), text))
Upvotes: 1