Reputation: 635
var1 is a character vector
var1 <- c("tax evasion", "all taxes", "payment")
and var2 is another character vector
var2 <- c("bill", "income tax", "sales taxes")
Want to compare var1 and var2 and extract the terms which has a partial word match, for example, the desired answer in this case will be the following character vector:
"tax evasion", "all taxes", "income tax", "sales taxes"
I tried
sapply(var1, grep, var2, ignore.case=T,value=T)
but not getting the desired answer. How can it be done?
Thanks.
Upvotes: 2
Views: 857
Reputation: 887028
May be you need
lst1 <- strsplit(var1, ' ')
lst2 <- strsplit(var2, ' ')
indx1 <- sapply(lst1, function(x) any(grepl(paste(unlist(lst2),
collapse="|"), x)))
indx2 <- sapply(lst2, function(x) any(grepl(paste(unlist(lst1),
collapse="|"), x)))
c(var1[indx1], var2[indx2])
#[1] "tax evasion" "all taxes" "income tax" "sales taxes"
If there are intersects between var1 and var2, wrap with with unique
as @ColonelBeauvel did in his elegant solution.
Upvotes: 1
Reputation: 31161
You can do (I use magrittr package for clarity of the code):
library(magrittr)
findIn = function(u, v)
{
strsplit(u,' ') %>%
unlist %>%
sapply(grep, value=T, x=v) %>%
unlist %>%
unique
}
unique(c(findIn(var1, var2), findIn(var2, var1)))
#[1] "income tax" "sales taxes" "tax evasion" "all taxes"
Upvotes: 3