Ashti
Ashti

Reputation: 97

A way append results when mutiple conditions are met

I provided two rows of my dataset below. In the dataset, I have a text column, and I have certain keywords that I need to look up in this text column, there is about 20 keyword searches I need to perform, and the text column may contain none of them keywords but may include all of them ( I only provided two keywords searches below). At the end of this, I need a new column called actual_tag that tells me which keywords the text column contains.

I know how to make the keywords as columns but I need to just have one column to tell the user which keywords it contained without writing too many if statements etc... is there an easy way to do this?

df=data.frame(text=c("the discrepency between the two items are great","there is discrepency between the calib"), 
                  actual_tag=c('discrepency','discrepency, calib'))

df2=df%>%mutate(discrepency=str_detect(text,'discrepency'),
               calib=str_detect(text,'calib'))

Upvotes: 0

Views: 26

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 146224

Create a combined pattern using | to separate each individual pattern and use stringr::str_extract_all:

keywords = c("discrepency", "calib")
pattern = paste(keywords, collapse = "|")
df %>%
  mutate(result = stringr::str_extract_all(text, pattern))
#                                              text         actual_tag             result
# 1 the discrepency between the two items are great        discrepency        discrepency
# 2          there is discrepency between the calib discrepency, calib discrepency, calib

The result will be a list column, but you could collapse it if you prefer:

df %>%
  mutate(
    result = stringr::str_extract_all(text, pattern),
    result = lapply(result, toString)
  )

Upvotes: 2

Related Questions