Reputation: 459
I am using the tm package in R to do some text mining. I have a matrix of term frequencies where every row is a document, every column is a word and every cell is the frequency of the word. I am trying to convert that to a DocumentTermTermMatrix
object. I can't seem to find a function that deals with that. Looks like the sources are usually the documents.
I've tried as.DocumentTermTermMatrix()
but it asks for an argument "weighting" giving the following error:
Error in .TermDocumentMatrix(t(x), weighting) :
argument "weighting" is missing, with no default
here is the code for a simple reproducible example
docs = matrix(sample(1:10, 50, replace=T), byrow = TRUE, ncol = 5, nrow=10)
rownames(docs) = paste0("doc", 1:10)
colnames(docs) = c("grad", "school", "is", "sleep", "deprivation")
so I would need to coerce the matrix docs into a DocumentTermMatrix
.
Upvotes: 1
Views: 1727
Reputation: 23598
Using your code example, you can use the following:
docs = matrix(sample(1:10, 50, replace=T), byrow = TRUE, ncol = 5, nrow=10)
rownames(docs) = paste0("doc", 1:10)
colnames(docs) = c("grad", "school", "is", "sleep", "deprivation")
dtm <- as.DocumentTermMatrix(docs, weighting = weightTfIdf)
If you read the help DocumentTermMatrix you see the following under arguments
weighting: A weighting function capable of handling a TermDocumentMatrix. It defaults to weightTf for term frequency weighting. Available weighting functions shipped with the tm package are weightTf, weightTfIdf, weightBin, and weightSMART.
Depending on your need you have to specify the weighting formula to use with your document term matrix. Or create one yourself.
Upvotes: 1