Reputation: 68
I have a large list that includes extracted terms from a corpus.
mylist <- list(c("flower"),
c("plant", "animal", "cats", "doggy"),
c("tree", "trees", "cat", "dog"))
The extracted terms are from a dataframe (as main words, similar words and categories)
ref <- data.frame(id = c(1:5),
main = c("tree", "plant", "flower", "dog", "cat"),
similar = c("trees","plantlike", "flowery", "doggy", "cats"),
category = c("plant", "plant", "plant", "animal", "animal"))
I need to change the list so that I have categories instead of the words. and maybe remove duplicates like this ...
needed <- list("plant",
c("plant", "animal", "animal", "animal"),
c("plant", "plant", "animal", "animal"))
orbetter <- list("plant",
c("plant", "animal"),
c("plant", "animal"))
but I don't know how to sapply for each element of the list. I appreciate your help.
Upvotes: 0
Views: 44
Reputation: 8880
mylist <- list(c("flower"),
c("plant", "animal", "cats", "doggy"),
c("tree", "trees", "cat", "dog"))
ref <- data.frame(id = c(1:5),
main = c("tree", "plant", "flower", "dog", "cat"),
similar = c("trees","plantlike", "flowery", "doggy", "cats"),
category = c("plant", "plant", "plant", "animal", "animal"))
library(tidyr)
ref_long <- ref %>%
pivot_longer(-c(id, category))
lapply(mylist, function(x) unique(ref_long$category[match(x, table = ref_long$value)]))
#> [[1]]
#> [1] "plant"
#>
#> [[2]]
#> [1] "plant" NA "animal"
#>
#> [[3]]
#> [1] "plant" "animal"
Created on 2022-01-14 by the reprex package (v2.0.1)
Upvotes: 1