Reputation: 443
I am trying to perform the following search on a database of text.
Here is the sample database, df
df <- data.frame(
id = c(1, 2, 3, 4, 5, 6),
name = c("john doe", "carol jones", "jimmy smith",
"jenny ruiz", "joey jones", "tim brown"),
place = c("reno nevada", "poland maine", "warsaw poland",
"trenton new jersey", "brooklyn new york", "atlanta georgia")
)
I have a vector of strings which contains terms I am trying to find.
new_search <- c("poland", "jones")
I pass the vector to str_detect to find ANY of the strings in new_search in ANY of the columns in df and then return rows which match...
df %>%
filter_all(any_vars(str_detect(., paste(new_search, collapse = "|"))))
Question... how can I extract the results of str_detect into a new column?
For each row which is returned... I would like to generate a list of the terms which were successfully matched and put them in a list or character vector (matched_terms)...something like this...
id name place matched_terms
1 2 carol jones poland maine c("jones", "poland")
2 3 jimmy smith warsaw poland c("poland")
3 5 joey jones brooklyn new york c("jones")
Upvotes: 1
Views: 1198
Reputation: 6769
This is my naive solution:
new_search <- c("poland", "jones") %>% paste(collapse = "|")
df %>%
mutate(new_var = str_extract_all(paste(name, place), new_search))
Upvotes: 5
Reputation: 388982
You can extract all the patterns in multiple columns using str_extract_all
, combine them into one column with unite
. unite
combines the column into one string hence the empty values are turned into "character(0)"
which we remove using str_remove_all
and keep only those rows that have any matched term.
library(tidyverse)
pat <- str_c(new_search, collapse = "|")
df %>%
mutate(across(-id, ~str_extract_all(., pat), .names = '{col}_new')) %>%
unite(matched_terms, ends_with('new'), sep = ',') %>%
mutate(matched_terms = str_remove_all(matched_terms,
'character\\(0\\),?|,character\\(0\\)')) %>%
filter(matched_terms != '')
# id name place matched_terms
#1 2 carol jones poland maine jones,poland
#2 3 jimmy smith warsaw poland poland
#3 5 joey jones brooklyn new york jones
Upvotes: 2