Sylababa
Sylababa

Reputation: 65

Count the number of tokens in a Documenttermmatrix

I have a question to a Documenttermmatrix. I would like to use the "LDAVIS" package in R. To visualize my results of the LDA algorithm I need to calculate the number of tokens of every document. I don´t have the text corpus for the considered DTM. Does anyone know how I can calculate the amount of tokens for every Document. The output as a list with the document name and his amount of tokens would be the perfect solution.

Kind Regards, Tom

Upvotes: 0

Views: 187

Answers (1)

phiver
phiver

Reputation: 23608

You can use slam::row_sums. This calculates the row_sums of a document term matrix without first transforming the dtm into a matrix. This function comes from the slam package which is installed when you install the tm package.

count_tokens <- slam::row_sums(dtm_goes_here)

# if you want a list
count_tokens_list <- as.list(slam::row_sums(dtm_goes_here))

Upvotes: 2

Related Questions