Reputation: 1111
Consider the following dataset :
a <- c("my house", "green", "the cat is", "a girl")
b <- c("my beautiful house is cool", "the apple is green", "I m looking at the cat that is sleeping", "a boy")
c <- c("T", "T", "T", "F")
df <- data.frame(string1=a, string2=b, returns=c)
I m trying to detect string1 in string2 BUT my goal is to not only detect exact matching. I m looking for a way to detect the presence of string1 words in string2, whatever the order words appear. As an example, the string "my beautiful house is cool" should return TRUE when searching for "my house".
I have tried to illustrate the expected behaviour of the script in the "return" column of above the example dataset.
I have tried grepl() and str_detect() functions but it only works with exact match. Can you please help ? Thanks in advance
Upvotes: 2
Views: 1931
Reputation: 39858
One base R
option without the involvement of split could be:
n_words <- lengths(regmatches(df[, 1], gregexpr(" ", df[, 1], fixed = TRUE))) + 1
n_matches <- mapply(FUN = function(x, y) lengths(regmatches(x, gregexpr(y, x))),
df[, 2],
gsub(" ", "|", df[, 1], fixed = TRUE),
USE.NAMES = FALSE)
n_matches == n_words
[1] TRUE TRUE TRUE FALSE
It, however, makes the assumption that there is at least one word per row in string1
Upvotes: 1
Reputation: 2835
The trick here is to not use str_detect as is but to first split the search_words
into individual words. This is done in strsplit()
below. We then pass this into str_detect
to check if all words are matched.
library(stringr)
search_words <- c("my house", "green", "the cat is", "a girl")
words <- c("my beautiful house is cool", "the apple is green", "I m looking at the cat that is sleeping", "a boy")
patterns <- strsplit(search_words," ")
mapply(function(word,string) all(str_detect(word,string)),words,patterns)
Upvotes: 2