Reputation: 195
I'm trying to scrape some websites using "RSelenium". However, it seems like the websites detect my attempt of scraping. Would it be possible to introduce some time gaps between each scrape. My code is this
Library('XML')
library('RSelenium')
checkForServer() # search for and download Selenium Server java binary. Only need to run once.
startServer() # run Selenium Server binary
remDr <- remoteDriver(browserName="firefox", port=4444) # instantiate remote driver to connect to Selenium Server
remDr$open(silent=T) # open web browser
page_sub = read.csv("indigogo_edu_us.csv")
url_list = as.vector(page_sub$full_url[1:3])
scrape = function(url_list){
remDr$navigate(url_list) # navigates to webpage
elem <- remDr$findElement(using="class", value="i-description")
elemtxt <- elem$getElementAttribute("outerHTML")[[1]]
elemxml <- htmlTreeParse(elemtxt, useInternalNodes=T)
fundList <- unlist(xpathApply(elemxml, '//input[@title]', xmlGetAttr, 'title')) # parses out just the fund name and ticker using XPath
page = as.data.frame(xpathSApply( elemxml,'//div[@class="i-description"]', xmlValue, encoding="UTF-8"))
names(page)[1] = "description"
}
cc = lapply(url_list, scrape)
Upvotes: 0
Views: 603
Reputation: 70653
Of course, Sys.sleep
. You can also use a random number generator to make it appear random.
Something along the lines of
Sys.sleep(runif(1, min = 3, max = 11))
Upvotes: 2