arndtupb
arndtupb

Reputation: 62

Convert Corpus from quanteda to tm

My data mycorpus is in a quanteda-corpus (corpus-function from quanteda) which I need to convert to a corpus under the tm package. I know about quanteda's convert-function. This, though, only converts a doc-feature-matrix to tm. Is there a quick fix I am missing? tm's VCorpus(mycorpus) throws an error message "missing source"

Upvotes: 1

Views: 397

Answers (2)

Ken Benoit
Ken Benoit

Reputation: 14902

You can construct a tm Corpus/VCorpus directly from a VectorSource wrapped in VCorpus, because a quanteda corpus is just a special character vector.

library("tm")
## Loading required package: NLP

# from version 3.0 of quanteda
data(data_corpus_inaugural, package = "quanteda")

VCorpus(VectorSource(data_corpus_inaugural))
## <<VCorpus>>
## Metadata:  corpus specific: 0, document level (indexed): 0
## Content:  documents: 59

However... Do you really want/need to do this?

Upvotes: 1

phiver
phiver

Reputation: 23608

If you have a dfm you can just use the as.DocumentTermMatrix function from the tm package.

If you have a dfm called my_dfm you can use the line of code below. You need to give an option to the weights of the dtm, but coming from quanteda it is just weightTf

my_dtm <- as.DocumentTermMatrix(my_dfm, weighting = weightTf)

Upvotes: 0

Related Questions