R: Subscript out of bounds when using tm function Corpus on LexisNexis-data

Question

I'm trying to create a corpus of articles from LexisNexis with the tm-package. The articles have been exported from LexisNexis as .html and are parsed into R with the tm.plugin.lexisnexis-package like so:

> library("tm")
> library("tm.plugin.lexisnexis")
> src <- LexisNexisSource("~/Desktop/lexisnexis.html")

Following the instructions in the tm.plugin.lexisnexis-documentation, I then create a corpus using the tm-package, like so:

> data <- Corpus(src, readerControl = list(language = NA))
Error in getNodeSet(tree, "//div[@class = 'c3']/p[@class = 'c1']/span[@class = 'c4']")[[1]] : 
  subscript out of bounds

What does this error mean, and how do I fix it?

Example html-data: link

R: Subscript out of bounds when using tm function Corpus on LexisNexis-data

Answers (1)

Related Questions