Scraping pdf files from web

Question

This question was answered here (Web scraping pdf files from HTML) but the solution doesn't work for me on either my target url or the target url of the op. I'm not supposed to ask this question as an answer to the earlier post so I'm starting a new Q.

My code is exactly as per the op and the error message that I receive is "Error in download.file(links[i], destfile = save_names[i]) : invalid 'url' argument"

The code I'm using is:

install.packages("RCurl")
install.packages("XML")
library(XML)
library(RCurl)
url <- "https://www.bot.or.th/English/MonetaryPolicy/Northern/EconomicReport/Pages/Releass_Economic_north.aspx"
page   <- getURL(url)
parsed <- htmlParse(page)
links  <- xpathSApply(parsed, path="//a", xmlGetAttr, "href")
inds   <- grep("*.pdf", links)
links  <- links[inds]


regex_match <- regexpr("[^/]+$", links)
save_names <- regmatches(links, regex_match)

for(i in seq_along(links)){
  download.file(links[i], destfile=save_names[i])
  Sys.sleep(runif(1, 1, 5))

}

Any help much appreciated! Thanks

Scraping pdf files from web

Answers (1)

Related Questions