Reputation: 635
I have a character vector
var1 <- c("pine tree", "forest", "fruits", "water")
and a list
var2 <- list(c("tree", "house", "star"), c("house", "tree", "dense forest"), c("apple", "orange", "grapes"))
I want to match words in var1 with words in var2, and RANK the list elements according to the number of words matched. For example,
[[2]]
[1] "house" "tree" "dense forest"
has 2 matches with var1
[[1]]
[1] "tree" "house" "star"
has 1 match with var1
[[3]]
[1] "apple" "orange" "grapes"
has 0 match with var1
And the desired output is the following rank:
[1] "house" "tree" "dense forest"
[2] "tree" "house" "star"
[3] "apple" "orange" "grapes"
I tried
sapply(var1, grep, var2, ignore.case=T, value=T)
without getting the output desired.
How to solve it? A code snippet would be appreciated. Thanks.
EDIT:
The problem has been edited from single word match to word match in phrases as described above.
Upvotes: 2
Views: 1729
Reputation: 5314
you can try
var2[[which.max(lapply(var2, function(x) sum(var1 %in% x)))]]
[1] "house" "tree" "forest"
from the last modification of the OP and @franks comment
var2[order(-sapply(var2, function(x) sum(var1 %in% x)))]
[[1]]
[1] "house" "tree" "forest"
[[2]]
[1] "tree" "house" "star"
[[3]]
[1] "apple" "orange" "grapes"
Upvotes: 4