user6633625673888
user6633625673888

Reputation: 635

R Match character vectors

var1 is a character vector

var1 <- c("tax evasion", "all taxes", "payment")

and var2 is another character vector

var2 <- c("bill", "income tax", "sales taxes")

Want to compare var1 and var2 and extract the terms which has a partial word match, for example, the desired answer in this case will be the following character vector:

"tax evasion", "all taxes", "income tax", "sales taxes"

I tried

sapply(var1, grep, var2, ignore.case=T,value=T)

but not getting the desired answer. How can it be done?

Thanks.

Upvotes: 2

Views: 857

Answers (2)

akrun
akrun

Reputation: 887028

May be you need

lst1 <- strsplit(var1, ' ')
lst2 <- strsplit(var2, ' ')

indx1 <- sapply(lst1, function(x) any(grepl(paste(unlist(lst2), 
       collapse="|"), x)))
indx2 <- sapply(lst2, function(x) any(grepl(paste(unlist(lst1),
       collapse="|"), x)))
c(var1[indx1], var2[indx2])
#[1] "tax evasion" "all taxes"   "income tax"  "sales taxes"

If there are intersects between var1 and var2, wrap with with unique as @ColonelBeauvel did in his elegant solution.

Upvotes: 1

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

You can do (I use magrittr package for clarity of the code):

library(magrittr)

findIn = function(u, v)
{
    strsplit(u,' ') %>%
        unlist %>%
        sapply(grep, value=T, x=v) %>%
        unlist %>%
        unique
}

unique(c(findIn(var1, var2), findIn(var2, var1)))
#[1] "income tax"  "sales taxes" "tax evasion" "all taxes"

Upvotes: 3

Related Questions