Reputation: 339
I want to convert a corpus to a DocumentTermMatrix with only selected words being tabulated. I know the "dictionary" parameter in the control list does this:
a = list("I am a big big big apple", "Petter Petter Peter Peter")
v = VCorpus(VectorSource(a))
my_terms = c("peter", "petter")
DocumentTermMatrix(v, control = list(dictionary = my_terms)) %>% as.matrix()
It gives me this:
Terms
Docs peter petter
1 0 0
2 1 1
Whereas what I want looks like this:
Terms
Docs peter petter
1 0 0
2 2 2
I was wondering if there is a function/parameter does this.
Upvotes: 1
Views: 657
Reputation: 1508
It works fine:
library(magrittr)
library(tm)
a <- list("I am a big big big apple", "Petter Petter Peter Peter")
v <- VCorpus(VectorSource(a))
my_terms <- c("peter", "petter")
DocumentTermMatrix(v, control = list(dictionary = my_terms)) %>%
as.matrix()
Upvotes: 0