Reputation: 17
This code snippet worked just fine until I decided to update R(3.6.3) and RStudio(1.2.5042) yesterday, though it is not obvious to me that is the source of the problem.
In a nutshell, I convert 91 pdf files into a volatile corpus named Vcorp and confirm that I created a volatile corpus as follows:
> Vcorp <- VCorpus(VectorSource(citiesText))
> class(Vcorp)
[1] "VCorpus" "Corpus"
Then I attempt to import this tm Vcorpus into quanteda, but keep getting an error message, which I did not get before (eg the day before the update).
> data(Vcorp, package = "tm")
> citiesCorpus <- corpus(Vcorp)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 8714, 91
Any suggestions? Thank you.
Upvotes: 0
Views: 175
Reputation: 14902
Impossible to know the exact problem without a) version information on your packages and b) a reproducible example.
Why use tm at all? You could have created a quanteda corpus directly as:
corpus(citiesText)
Converting a VCorpus works fine for me.
library("quanteda")
## Package version: 2.0.1
library("tm")
packageVersion("tm")
## [1] ‘0.7.7’
reut21578 <- system.file("texts", "crude", package = "tm")
VCorp <- VCorpus(
DirSource(reut21578, mode = "binary"),
list(reader = readReut21578XMLasPlain)
)
corpus(VCorp)
## Corpus consisting of 20 documents and 16 docvars.
## text1 :
## "Diamond Shamrock Corp said that effective today it had cut i..."
##
## text2 :
## "OPEC may be forced to meet before a scheduled June session t..."
##
## text3 :
## "Texaco Canada said it lowered the contract price it will pay..."
##
## text4 :
## "Marathon Petroleum Co said it reduced the contract price it ..."
##
## text5 :
## "Houston Oil Trust said that independent petroleum engineers ..."
##
## text6 :
## "Kuwait"s Oil Minister, in remarks published today, said ther..."
##
## [ reached max_ndoc ... 14 more documents ]
Upvotes: 0