Reputation: 97
I provided two rows of my dataset below. In the dataset, I have a text column, and I have certain keywords that I need to look up in this text column, there is about 20 keyword searches I need to perform, and the text column may contain none of them keywords but may include all of them ( I only provided two keywords searches below). At the end of this, I need a new column called actual_tag that tells me which keywords the text column contains.
I know how to make the keywords as columns but I need to just have one column to tell the user which keywords it contained without writing too many if statements etc... is there an easy way to do this?
df=data.frame(text=c("the discrepency between the two items are great","there is discrepency between the calib"),
actual_tag=c('discrepency','discrepency, calib'))
df2=df%>%mutate(discrepency=str_detect(text,'discrepency'),
calib=str_detect(text,'calib'))
Upvotes: 0
Views: 26
Reputation: 146224
Create a combined pattern using |
to separate each individual pattern and use stringr::str_extract_all
:
keywords = c("discrepency", "calib")
pattern = paste(keywords, collapse = "|")
df %>%
mutate(result = stringr::str_extract_all(text, pattern))
# text actual_tag result
# 1 the discrepency between the two items are great discrepency discrepency
# 2 there is discrepency between the calib discrepency, calib discrepency, calib
The result
will be a list
column, but you could collapse it if you prefer:
df %>%
mutate(
result = stringr::str_extract_all(text, pattern),
result = lapply(result, toString)
)
Upvotes: 2