user6633625673888
user6633625673888

Reputation: 635

R match words in list

I have a character vector

var1 <- c("pine tree", "forest", "fruits", "water")

and a list

var2 <- list(c("tree", "house", "star"),  c("house", "tree", "dense forest"), c("apple", "orange", "grapes"))

I want to match words in var1 with words in var2, and RANK the list elements according to the number of words matched. For example,

[[2]]
[1] "house"  "tree"   "dense forest"

has 2 matches with var1

[[1]]
[1] "tree"  "house" "star"   

has 1 match with var1

[[3]]
[1] "apple"  "orange" "grapes"

has 0 match with var1

And the desired output is the following rank:

[1] "house"  "tree"   "dense forest"
[2] "tree"  "house" "star"
[3] "apple"  "orange" "grapes"

I tried

sapply(var1, grep,  var2, ignore.case=T, value=T)

without getting the output desired.

How to solve it? A code snippet would be appreciated. Thanks.

EDIT:

The problem has been edited from single word match to word match in phrases as described above.

Upvotes: 2

Views: 1729

Answers (1)

Mamoun Benghezal
Mamoun Benghezal

Reputation: 5314

you can try

var2[[which.max(lapply(var2, function(x) sum(var1 %in% x)))]]
[1] "house"  "tree"   "forest"

from the last modification of the OP and @franks comment

var2[order(-sapply(var2, function(x) sum(var1 %in% x)))]
[[1]]
[1] "house"  "tree"   "forest"
[[2]]
[1] "tree"  "house" "star" 
[[3]]
[1] "apple"  "orange" "grapes"

Upvotes: 4

Related Questions