Reputation: 13
This question was answered here (Web scraping pdf files from HTML) but the solution doesn't work for me on either my target url or the target url of the op. I'm not supposed to ask this question as an answer to the earlier post so I'm starting a new Q.
My code is exactly as per the op and the error message that I receive is "Error in download.file(links[i], destfile = save_names[i]) : invalid 'url' argument"
The code I'm using is:
install.packages("RCurl")
install.packages("XML")
library(XML)
library(RCurl)
url <- "https://www.bot.or.th/English/MonetaryPolicy/Northern/EconomicReport/Pages/Releass_Economic_north.aspx"
page <- getURL(url)
parsed <- htmlParse(page)
links <- xpathSApply(parsed, path="//a", xmlGetAttr, "href")
inds <- grep("*.pdf", links)
links <- links[inds]
regex_match <- regexpr("[^/]+$", links)
save_names <- regmatches(links, regex_match)
for(i in seq_along(links)){
download.file(links[i], destfile=save_names[i])
Sys.sleep(runif(1, 1, 5))
}
Any help much appreciated! Thanks
Upvotes: 0
Views: 774
Reputation: 13
Solved! I don't know why this works but it does. I have swapped the for loop for the following code and it works:
Map (function(u, d) download.file(u, d, mode='wb'), links, save_names)
Upvotes: 0