Error when importing tm Vcorpus into Quanteda corpus

Question

This code snippet worked just fine until I decided to update R(3.6.3) and RStudio(1.2.5042) yesterday, though it is not obvious to me that is the source of the problem.

In a nutshell, I convert 91 pdf files into a volatile corpus named Vcorp and confirm that I created a volatile corpus as follows:

> Vcorp <- VCorpus(VectorSource(citiesText)) 
> class(Vcorp)
[1] "VCorpus" "Corpus"

Then I attempt to import this tm Vcorpus into quanteda, but keep getting an error message, which I did not get before (eg the day before the update).

> data(Vcorp, package = "tm")   
> citiesCorpus <- corpus(Vcorp)
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 8714, 91

Any suggestions? Thank you.

Ken Benoit · Accepted Answer

Impossible to know the exact problem without a) version information on your packages and b) a reproducible example.

Why use tm at all? You could have created a quanteda corpus directly as:

corpus(citiesText)

Converting a VCorpus works fine for me.

library("quanteda")
## Package version: 2.0.1

library("tm")
packageVersion("tm")
## [1] ‘0.7.7’

reut21578 <- system.file("texts", "crude", package = "tm")
VCorp <- VCorpus(
  DirSource(reut21578, mode = "binary"),
  list(reader = readReut21578XMLasPlain)
)

corpus(VCorp)
## Corpus consisting of 20 documents and 16 docvars.
## text1 :
## "Diamond Shamrock Corp said that effective today it had cut i..."
## 
## text2 :
## "OPEC may be forced to meet before a scheduled June session t..."
## 
## text3 :
## "Texaco Canada said it lowered the contract price it will pay..."
## 
## text4 :
## "Marathon Petroleum Co said it reduced the contract price it ..."
## 
## text5 :
## "Houston Oil Trust said that independent petroleum engineers ..."
## 
## text6 :
## "Kuwait"s Oil Minister, in remarks published today, said ther..."
## 
## [ reached max_ndoc ... 14 more documents ]

Error when importing tm Vcorpus into Quanteda corpus

Answers (1)

Related Questions