Reputation: 59
I have a data frame that contains text comments and I have a character vector of key words that I want to see if they are contained in each row of comments. If the word is in the comments then put that word into a new column.
I currently have this code that puts a 1 next to a row that contains any of the keywords in the comments. So just want to replace this with the actual keywords themselves.
keywords <- c('poor communication', 'email', 'tools', 'hardware', 'software')
df <- transform(df, Topic=grepl(paste0(keywords,collapse='|'), df$Comment))
Category | Comment | Topic |
---|---|---|
Sales | i have to use my email everyday and they dont work the poor communication is not acceptable | 1 |
Marketing | i think the tools are not adequate for the tasks we want to achieve | 0 |
This is my desired output:
Category | Comment | Topic |
---|---|---|
Sales | i have to use my email everyday and they dont work the poor communication is not acceptable | email, poor communication |
Marketing | i think the tools are not adequate for the tasks we want to achieve | tools |
Upvotes: 0
Views: 803
Reputation: 8811
library(tidyverse)
df <-
tibble(
Category = c("Sales","Marketing"),
Comment = c("i have to use my email everyday and they dont work the poor communication is not acceptable",
"i think the tools are not adequate for the tasks we want to achieve"
)
)
keywords <- c('poor communication', 'email', 'tools', 'hardware', 'software')
df %>%
#Applyng for each row
rowwise() %>%
mutate(
Topic =
#Extract keyword from the string
str_extract(Comment,keywords) %>%
#Remoing NA's
na.omit() %>%
#Paste keywords
paste0(collapse = ", ")
)
Upvotes: 1
Reputation: 388817
Using str_extract_all
you can extract all the keywords
available in the Comment
, use sapply
to collapse them into one comma separated string for each row.
library(stringr)
df$topic <- sapply(str_extract_all(df$Comment, paste0(keywords, collapse = '|')), toString)
df$topic
#[1] "email, poor communication" "tools"
data
It is easier to help if you provide data in a reproducible format -
df <- structure(list(Category = c("Sales", "Marketing"), Comment = c("i have to use my email everyday and they dont work the poor communication is not acceptable",
"i think the tools are not adequate for the tasks we want to achieve"
)), row.names = c(NA, -2L), class = "data.frame")
Upvotes: 0