Remi
Remi

Reputation: 1111

Detect part of a string in R (not exact match)

Consider the following dataset :

a <- c("my house", "green", "the cat is", "a girl")
b <- c("my beautiful house is cool", "the apple is green", "I m looking at the cat that is sleeping", "a boy")
c <- c("T", "T", "T", "F")
df <- data.frame(string1=a, string2=b, returns=c)

I m trying to detect string1 in string2 BUT my goal is to not only detect exact matching. I m looking for a way to detect the presence of string1 words in string2, whatever the order words appear. As an example, the string "my beautiful house is cool" should return TRUE when searching for "my house".

I have tried to illustrate the expected behaviour of the script in the "return" column of above the example dataset.

I have tried grepl() and str_detect() functions but it only works with exact match. Can you please help ? Thanks in advance

Upvotes: 2

Views: 1931

Answers (2)

tmfmnk
tmfmnk

Reputation: 39858

One base R option without the involvement of split could be:

n_words <- lengths(regmatches(df[, 1], gregexpr(" ", df[, 1], fixed = TRUE))) + 1

n_matches <- mapply(FUN = function(x, y) lengths(regmatches(x, gregexpr(y, x))), 
                    df[, 2],
                    gsub(" ", "|", df[, 1], fixed = TRUE),
                    USE.NAMES = FALSE)

n_matches == n_words

[1]  TRUE  TRUE  TRUE FALSE

It, however, makes the assumption that there is at least one word per row in string1

Upvotes: 1

Sada93
Sada93

Reputation: 2835

The trick here is to not use str_detect as is but to first split the search_words into individual words. This is done in strsplit() below. We then pass this into str_detect to check if all words are matched.

library(stringr)
search_words <- c("my house", "green", "the cat is", "a girl")
words <- c("my beautiful house is cool", "the apple is green", "I m looking at the cat that is sleeping", "a boy")

patterns <- strsplit(search_words," ")

mapply(function(word,string) all(str_detect(word,string)),words,patterns)

Upvotes: 2

Related Questions