Saleem Khan
Saleem Khan

Reputation: 749

add column of listed keywords(strings) based on text column

If i have a dataframe with the following column:

df$text <- c("This string is not that long", "This string is a bit longer but still not that long", "This one just helps with the example")

and strings like so:

keywords <- c("not that long", "This string", "example", "helps")

I am trying to add a column to my dataframe with a list of the keywords that exist in the text for each row:

df$keywords:

1 c("This string","not that long")    
2 c("This string","not that long")    
3 c("helps","example")

Though i'm unsure how to 1) extract the matching words from the text column and 2) how to then list them matching words in each row in the new column

Upvotes: 4

Views: 250

Answers (2)

akrun
akrun

Reputation: 887213

We can extract with str_extract from stringr

library(stringr)
df$keywords <- str_extract_all(df$text, paste(keywords, collapse = "|"))
df
#                                                text                   keywords
#1                        This string is not that long This string, not that long
#2 This string is a bit longer but still not that long This string, not that long
#3                This one just helps with the example             helps, example

Or in a chain

library(dplyr)
df %>%
   mutate(keywords = str_extract_all(text, paste(keywords, collapse = "|")))

Upvotes: 3

Florian
Florian

Reputation: 25385

Maybe like this:

df = data.frame(text=c("This string is not that long", "This string is a bit longer but still not that long", "This one just helps with the example"))
keywords <- c("not that long", "This string", "example", "helps")

df$keywords = lapply(df$text, function(x) {keywords[sapply(keywords,grepl,x)]})

Output:

                                                 text                   keywords
1                        This string is not that long not that long, This string
2 This string is a bit longer but still not that long not that long, This string
3                This one just helps with the example             example, helps

The outer lapply loops over df$text, and the inner lapply checks for each element of keywords if it is in the element of df$text. So a slightly longer but perhaps easier to read equivalent would be:

df$keywords = lapply(df$text, function(x) {keywords[sapply(keywords, function(y){grepl(y,x)})]})

Hope this helps!

Upvotes: 3

Related Questions