KikiZ
KikiZ

Reputation: 59

If any string values in a character vector are in a column of a data frame, return the string that matches in a new column

I have a data frame that contains text comments and I have a character vector of key words that I want to see if they are contained in each row of comments. If the word is in the comments then put that word into a new column.

I currently have this code that puts a 1 next to a row that contains any of the keywords in the comments. So just want to replace this with the actual keywords themselves.

keywords <- c('poor communication', 'email', 'tools', 'hardware', 'software')
df <- transform(df, Topic=grepl(paste0(keywords,collapse='|'), df$Comment))
Category Comment Topic
Sales i have to use my email everyday and they dont work the poor communication is not acceptable 1
Marketing i think the tools are not adequate for the tasks we want to achieve 0

This is my desired output:

Category Comment Topic
Sales i have to use my email everyday and they dont work the poor communication is not acceptable email, poor communication
Marketing i think the tools are not adequate for the tasks we want to achieve tools

Upvotes: 0

Views: 803

Answers (2)

Vin&#237;cius F&#233;lix
Vin&#237;cius F&#233;lix

Reputation: 8811

library(tidyverse)

df <-
  tibble(
    Category = c("Sales","Marketing"),
    Comment = c("i have to use my email everyday and they dont work the poor communication is not acceptable",
                "i think the tools are not adequate for the tasks we want to achieve"   
                )
  )

keywords <- c('poor communication', 'email', 'tools', 'hardware', 'software')


df %>% 
  #Applyng for each row
  rowwise() %>% 
  mutate(
    Topic =
      #Extract keyword from the string
      str_extract(Comment,keywords) %>%
      #Remoing NA's 
      na.omit() %>% 
      #Paste keywords
      paste0(collapse = ", ")
  )

enter image description here

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

Using str_extract_all you can extract all the keywords available in the Comment, use sapply to collapse them into one comma separated string for each row.

library(stringr)

df$topic <- sapply(str_extract_all(df$Comment, paste0(keywords, collapse = '|')), toString)
df$topic

#[1] "email, poor communication" "tools"         

data

It is easier to help if you provide data in a reproducible format -

df <- structure(list(Category = c("Sales", "Marketing"), Comment = c("i have to use my email everyday and they dont work the poor communication is not acceptable", 
"i think the tools are not adequate for the tasks we want to achieve"
)), row.names = c(NA, -2L), class = "data.frame")

Upvotes: 0

Related Questions