DocumentTermMatrix in tm package does not return all words

Question

I'm creating a document-term matrix with the tm-package in R, but some of the words in my corpus get lost in the process somewhere.

I will explain with an example. Let's say I have this small corpus

library(tm)
crps <- " more hours to my next class bout to go home and go night night"
crps <- VCorpus(VectorSource(crps))

When I use DocumentTermMatrix() from the tm-package, it will return these results:

dm <- DocumentTermMatrix(crps)
dm_matrix <- as.matrix(dm)
dm_matrix
# Terms
# Docs and bout class home hours more next night
# 1   1    1     1    1     1    1    1     2

However, what I want (and expected) is:

# Docs and bout class home hours more next night my  go to
#  1   1    1     1    1     1    1    1     2   1   2  1

Why does DocumentTermMatrix() skip the words "my","go"and "to"? Is there a way to control and fix this function?

DocumentTermMatrix in tm package does not return all words

Answers (1)

Related Questions