Term frequency table to DocumentTermMatrix in tm R package

Question

I am using the tm package in R to do some text mining. I have a matrix of term frequencies where every row is a document, every column is a word and every cell is the frequency of the word. I am trying to convert that to a DocumentTermTermMatrix object. I can't seem to find a function that deals with that. Looks like the sources are usually the documents.

I've tried as.DocumentTermTermMatrix() but it asks for an argument "weighting" giving the following error:

Error in .TermDocumentMatrix(t(x), weighting) :
argument "weighting" is missing, with no default

here is the code for a simple reproducible example

docs = matrix(sample(1:10, 50, replace=T), byrow = TRUE, ncol = 5, nrow=10) 
rownames(docs) = paste0("doc", 1:10)
colnames(docs) = c("grad", "school", "is", "sleep", "deprivation")

so I would need to coerce the matrix docs into a DocumentTermMatrix.

phiver · Accepted Answer

Using your code example, you can use the following:

docs = matrix(sample(1:10, 50, replace=T), byrow = TRUE, ncol = 5, nrow=10) 
rownames(docs) = paste0("doc", 1:10)
colnames(docs) = c("grad", "school", "is", "sleep", "deprivation")

dtm <- as.DocumentTermMatrix(docs, weighting = weightTfIdf)

If you read the help DocumentTermMatrix you see the following under arguments

weighting: A weighting function capable of handling a TermDocumentMatrix. It defaults to weightTf for term frequency weighting. Available weighting functions shipped with the tm package are weightTf, weightTfIdf, weightBin, and weightSMART.

Depending on your need you have to specify the weighting formula to use with your document term matrix. Or create one yourself.

Term frequency table to DocumentTermMatrix in tm R package

Answers (1)

Related Questions