yarbaur
yarbaur

Reputation: 75

RCurl error: Connection reset by peer

I am scraping a website for links using the XML and RCurl packages of R. I need to make multiple calls (several thousand).

The script I use is in the following form:

raw <-  getURL("http://www.example.com",encoding="UTF-8",.mapUnicode = F)
parsed <- htmlParse(raw)
links <- xpathSApply(parsed,"//a/@href")

...
...
return(links)

When used a single time, there is no problem. However, when applied to a list of urls (using sapply), I receive the following error:

Error in function (type, msg, asError = TRUE) : Recv failure: Connection reset by peer

If I retry the same request later it usually returns ok. I am new to Curl and web scraping, and not sure how to fix or avoid this.

Thank you in advance

Upvotes: 0

Views: 1065

Answers (1)

Mauricio Romero
Mauricio Romero

Reputation: 77

Try something like this

for(i in 1:length(links)){

  try(WebPage <- getURL(links[[i]], ssl.verifypeer = FALSE,curl=curl))
  while((inherits(NivelRegion, "try-error"))){
      Sys.sleep(1)
     try(WebPage <- getURL(links[[i]], ssl.verifypeer = FALSE,curl=curl))
  }

Upvotes: 0

Related Questions