Reputation: 1411
I have a large csv file (3.8 Gb) with data in column (term), row (document) format. I would like to convert this to a dtm from the tm package.
I am skipping the read.csv
step here, but you get the idea.
dtm <- structure(list(the = c(2L, 1L), apple = c(0L, 2L), dumb = c(1L, 0L)), .Names = c("the", "apple", "dumb"), class = "data.frame", row.names = c(NA, -2L))
I then don't know how to convert this to a formal tm package dtm:
c <- Corpus(DataframeSource(dtm))
That's wrong, obviously.
Thanks for any direction.
Upvotes: 0
Views: 466
Reputation: 14902
This will do it:
tmDTM <- tm::as.DocumentTermMatrix(slam::as.simple_triplet_matrix(dtm),
weighting = tm::weightTf)
The quanteda package has some nice implementations to this functionality as well.
Upvotes: 1