Increase execution speed for web scraping with for loops

Question

The following code performs a web scraping at a travel company, where for each origin and destination are detailed the bus lines that make the trip. The idea is to put together a table with all the rows and companies they perform. The code itself works and fulfills its goal. However, I need to perform this search for a list of about 650 cities. I left the code running for about 3 hours and he held less than 5% of the cities. Any suggestions on how I can improve performance in this code?

library(RCurl)
library(jsonlite)
library(rjson)
library(dplyr)


#LOAD LIST OF CITYS
cidades <- jsonlite::fromJSON("Listas Cidades Artesp.json")
empresasXlinhas <- data.frame()

#RCURL PARAMETRES
headers = c(
  "Accept" = "application/json, text/javascript, */*; q=0.01",
  "Accept-Language" = "en-US,en;q=0.9",
  "Connection" = "keep-alive",
  "Content-Type" = "application/x-www-form-urlencoded; charset=UTF-8",
  "Cookie" = "__RequestVerificationToken_L1RyYW5zcG9ydGVDb2xldGl2bw2=tY-yKlWmbZvAJzMHmITkohPiIos5XkjDBwf1ZBfP_bYWdXJMBF2Qw3z_B-LRVo0kXjdnHqDqsbZ04Zij_PM-wAf4DWVKfnQskOhqo4ANSRc1",
  "Origin" = "http://extranet.artesp.sp.gov.br",
  "Referer" = "http://extranet.artesp.sp.gov.br/TransporteColetivo/OrigemDestino?fbclid=IwAR3_hZwajHk_iyU085S1LDTqLCOYLHIZ5K825XgPGcB4tMI0EuCJpQNrJHM",
  "User-Agent" = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
  "X-Requested-With" = "XMLHttpRequest"
)

#WEB SCRAPING
for (x in cidades$Codigo){
  for (y in cidades$Codigo){
    
    if (x != y){
      params = paste("origem=",x,"&destino=",y,"&__RequestVerificationToken=Z-wXmGOb9pnQbmkfcQXmChT-6uc3YfGjftHwK4HnC9SDCaKmzIafo7AI3lChBY6YDBHdpT_X98mSHGAr_YrTNgKiepKxKraGu7p6PI7dV4g1", sep ="")
      res <- postForm("http://extranet.artesp.sp.gov.br/TransporteColetivo/OrigemDestino/GetGrid", .opts=list(postfields = params, httpheader = headers, followlocation = TRUE), style = "httppost")
      if (res != "[]"){
        print(paste(x,y))
        empresa <- jsonlite::fromJSON(res)[[2]]
        empresa <- empresa %>% mutate(cod_origem = x, cod_destino = y)
        empresasXlinhas <- rbind(empresasXlinhas, empresa)
        
      }
    }
  }
}

Increase execution speed for web scraping with for loops

Answers (1)

Related Questions