user7453767
user7453767

Reputation: 339

DocumentTermMatrix with dictionary

I want to convert a corpus to a DocumentTermMatrix with only selected words being tabulated. I know the "dictionary" parameter in the control list does this:

     a = list("I am a big big big apple", "Petter Petter Peter Peter")
     v = VCorpus(VectorSource(a))
     my_terms = c("peter", "petter")
     DocumentTermMatrix(v, control = list(dictionary = my_terms)) %>% as.matrix()

It gives me this:

        Terms
    Docs peter petter
       1     0      0
       2     1      1

Whereas what I want looks like this:

        Terms
    Docs peter petter
       1     0      0
       2     2      2
  1. The first document, though empty, must remain there. (Because it must be matched with a meta-data)
  2. The frequency of the word must be shown in the output.

I was wondering if there is a function/parameter does this.

Upvotes: 1

Views: 657

Answers (1)

Scipione Sarlo
Scipione Sarlo

Reputation: 1508

It works fine:

library(magrittr)
library(tm)

a <- list("I am a big big big apple", "Petter Petter Peter Peter")
v <- VCorpus(VectorSource(a))
my_terms <- c("peter", "petter")
DocumentTermMatrix(v, control = list(dictionary = my_terms)) %>% 
         as.matrix()

Upvotes: 0

Related Questions