Reputation: 75
I am scraping a website for links using the XML and RCurl packages of R. I need to make multiple calls (several thousand).
The script I use is in the following form:
raw <- getURL("http://www.example.com",encoding="UTF-8",.mapUnicode = F)
parsed <- htmlParse(raw)
links <- xpathSApply(parsed,"//a/@href")
...
...
return(links)
When used a single time, there is no problem. However, when applied to a list of urls (using sapply), I receive the following error:
Error in function (type, msg, asError = TRUE) : Recv failure: Connection reset by peer
If I retry the same request later it usually returns ok. I am new to Curl and web scraping, and not sure how to fix or avoid this.
Thank you in advance
Upvotes: 0
Views: 1065
Reputation: 77
Try something like this
for(i in 1:length(links)){
try(WebPage <- getURL(links[[i]], ssl.verifypeer = FALSE,curl=curl))
while((inherits(NivelRegion, "try-error"))){
Sys.sleep(1)
try(WebPage <- getURL(links[[i]], ssl.verifypeer = FALSE,curl=curl))
}
Upvotes: 0