R: Introducing time intervals when scraping

Question

I'm trying to scrape some websites using "RSelenium". However, it seems like the websites detect my attempt of scraping. Would it be possible to introduce some time gaps between each scrape. My code is this

Library('XML')
library('RSelenium')
checkForServer() # search for and download Selenium Server java binary.  Only need to run once.
startServer() # run Selenium Server binary
remDr <- remoteDriver(browserName="firefox", port=4444) # instantiate remote driver to connect to Selenium Server
remDr$open(silent=T) # open web browser

page_sub = read.csv("indigogo_edu_us.csv")

url_list = as.vector(page_sub$full_url[1:3])  

scrape = function(url_list){  

  remDr$navigate(url_list) # navigates to webpage

  elem <- remDr$findElement(using="class", value="i-description") 
  elemtxt <- elem$getElementAttribute("outerHTML")[[1]] 
  elemxml <- htmlTreeParse(elemtxt, useInternalNodes=T)  

  fundList <- unlist(xpathApply(elemxml, '//input[@title]', xmlGetAttr, 'title')) # parses out just the fund name and ticker using XPath
  page = as.data.frame(xpathSApply(  elemxml,'//div[@class="i-description"]', xmlValue, encoding="UTF-8"))
  names(page)[1] = "description"
}
cc = lapply(url_list, scrape)

Roman Luštrik · Accepted Answer

Of course, Sys.sleep. You can also use a random number generator to make it appear random.

Something along the lines of

Sys.sleep(runif(1, min = 3, max = 11))

R: Introducing time intervals when scraping

Answers (1)

Related Questions