Michael Davidson
Michael Davidson

Reputation: 1411

Convert Large CSV DTM to tm package DTM

I have a large csv file (3.8 Gb) with data in column (term), row (document) format. I would like to convert this to a dtm from the tm package.

I am skipping the read.csv step here, but you get the idea.

dtm <- structure(list(the = c(2L, 1L), apple = c(0L, 2L), dumb = c(1L, 0L)), .Names = c("the", "apple", "dumb"), class = "data.frame", row.names = c(NA, -2L))

I then don't know how to convert this to a formal tm package dtm:

c <- Corpus(DataframeSource(dtm))

That's wrong, obviously.

Thanks for any direction.

Upvotes: 0

Views: 466

Answers (1)

Ken Benoit
Ken Benoit

Reputation: 14902

This will do it:

tmDTM <- tm::as.DocumentTermMatrix(slam::as.simple_triplet_matrix(dtm),
                                   weighting = tm::weightTf)

The quanteda package has some nice implementations to this functionality as well.

Upvotes: 1

Related Questions