Reputation: 97
I am looking for efficient way to code the below . I am looking for anything that contains add and onion in the text then it is Found and if it is not there then it not found. I want to get this in an efficient manner. I dont want to hard code all the combination in it . I am looking for if add and onion are there in the text then it is found.
word_check <- c("add get onion" ,
"add to onion",
"add oil to onion",
"add oils to onion" ,
"add salt to onion" ,
"add get onion" ,
"add get onion",
"add get onion")
df <- as.data.frame(c("I can add get onion" ,
"we can add to onion",
"I love to add oil to onion",
"I may not add oils to onion" ,
"add salt to onion" ,
"add get onion" ,
"abc",
"def" ,
"ghi",
"jkl",
"add get onion",
"add get onion","add oil to the vegetable", "add onion to the vegetable" ))
names(df)[1] <- "text"
pattern_word_check <- paste(word_check, collapse = "|")
df$New <- ifelse(str_detect(df$text, regex(pattern_word_check)),"Found","Not Found")```
Regards, R
Upvotes: 1
Views: 48
Reputation: 643
Here is a solution using tidytext
. For your concrete example, this may seem a bit like overkill, but using more highlevel functions like a tokenizer together with an inner_join
makes the code more clear and easier to build on. (imo)
df <- as.data.frame(c("I can add get onion" ,
"we can add to onion",
"I love to add oil to onion",
"I may not add oils to onion" ,
"add salt to onion" ,
"add get onion" ,
"abc",
"def" ,
"ghi",
"jkl",
"add get onion",
"add get onion","add oil to the vegetable", "add onion to the vegetable" ), stringsAsFactors = FALSE)
names(df)[1] <- "text"
library(dplyr)
library(tidytext)
df_words <- df %>%
unnest_tokens(output = word,
input = text,
token = "words",
drop = FALSE)
inner_join(
df_words %>% filter(word == "add"),
df_words %>% filter(word == "onion"),
by = "text"
) %>%
select(text) %>%
distinct()
#> text
#> 1 I can add get onion
#> 2 we can add to onion
#> 3 I love to add oil to onion
#> 4 I may not add oils to onion
#> 5 add salt to onion
#> 6 add get onion
#> 7 add onion to the vegetable
Created on 2020-04-02 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 389325
Since you want to check for only "onion"
and "add"
which can occur in any order, you could do.
df$New <- ifelse(grepl('.*add.*onion.*|.*onion.*add.*',df$text), "found", "not found")
#Faster option without ifelse
#df$New <- c('Not found', 'found')[grepl('.*add.*onion.*|.*onion.*add.*', df$text) + 1]
df
# text New
#1 I can add get onion found
#2 we can add to onion found
#3 I love to add oil to onion found
#4 I may not add oils to onion found
#5 add salt to onion found
#6 add get onion found
#7 abc not found
#8 def not found
#9 ghi not found
#10 jkl not found
#11 add get onion found
#12 add get onion found
Upvotes: 0
Reputation: 6226
Maybe I misunderstood so I propose you when solution based on your pattern_word_check
variable and another using only onion and add in the regex.
Anyway, I think you are looking for grepl
. You have many ways to solve your problem.
A data.table
solution, using conditional replacement, would be :
library(data.table)
setDT(df)
df[,'new' := "Not Found"]
df[grepl(pattern_word_check, text), new := "Found"]
If you only want to consider words with "onion" OR "add"
df[,'new' := "Not Found"]
df[grepl("(onion|add)", text), new := "Found"]
A dplyr
solution would be:
library(dplyr)
df %>% mutate(new = if_else(grepl(pattern_word_check, text), "Found", "Not Found"))
Note that if use if_else
from dplyr
package, not base ifelse
.
If you only want to consider words with "onion" OR "add"
library(dplyr)
df %>% mutate(new = if_else(grepl("(onion|add)", text), "Found", "Not Found"))
Upvotes: 1