Reputation: 749
If i have a dataframe with the following column:
df$text <- c("This string is not that long", "This string is a bit longer but still not that long", "This one just helps with the example")
and strings like so:
keywords <- c("not that long", "This string", "example", "helps")
I am trying to add a column to my dataframe with a list of the keywords that exist in the text for each row:
df$keywords:
1 c("This string","not that long")
2 c("This string","not that long")
3 c("helps","example")
Though i'm unsure how to 1) extract the matching words from the text column and 2) how to then list them matching words in each row in the new column
Upvotes: 4
Views: 250
Reputation: 887213
We can extract with str_extract
from stringr
library(stringr)
df$keywords <- str_extract_all(df$text, paste(keywords, collapse = "|"))
df
# text keywords
#1 This string is not that long This string, not that long
#2 This string is a bit longer but still not that long This string, not that long
#3 This one just helps with the example helps, example
Or in a chain
library(dplyr)
df %>%
mutate(keywords = str_extract_all(text, paste(keywords, collapse = "|")))
Upvotes: 3
Reputation: 25385
Maybe like this:
df = data.frame(text=c("This string is not that long", "This string is a bit longer but still not that long", "This one just helps with the example"))
keywords <- c("not that long", "This string", "example", "helps")
df$keywords = lapply(df$text, function(x) {keywords[sapply(keywords,grepl,x)]})
Output:
text keywords
1 This string is not that long not that long, This string
2 This string is a bit longer but still not that long not that long, This string
3 This one just helps with the example example, helps
The outer lapply
loops over df$text
, and the inner lapply
checks for each element of keywords
if it is in the element of df$text
. So a slightly longer but perhaps easier to read equivalent would be:
df$keywords = lapply(df$text, function(x) {keywords[sapply(keywords, function(y){grepl(y,x)})]})
Hope this helps!
Upvotes: 3