Sean Norton
Sean Norton

Reputation: 287

RSelenium: Error while downloading files with Chrome

I am using RSelenium to download a number of .xls files. I was able to get a somewhat passable solution using the following code to set up the server, which specifies not to create a pop-up when I click on the download link and where to download the file to. However, without fail, once I download the 101st file (saved as "report (100).xls) the download pop-up begins appearing in the browser Selenium is driving.

eCaps <- list(
  chromeOptions = 
    list(prefs = list(
      "profile.default_content_settings.popups" = 0L,
      "download.prompt_for_download" = FALSE,
      "download.default_directory" = "mydownloadpath"
    )
    )
)

rd <- rsDriver(browser = "chrome", port=4566L, extraCapabilities = eCaps)

The function to download then looks like:

vote.downloading <- function(url){

  #NB: this function assumes browser already up and running, options set correctly

  Sys.sleep(1.5)

  browser$navigate(url)

  down_button <- browser$findElement(using="css", 
                                     "table:nth-child(4) tr:nth-child(3) a")
  down_button$clickElement()


}

For reference, the sites I'm getting the download from look like this: http://www.moscow_city.vybory.izbirkom.ru/region/moscow_city?action=show&root=774001001&tvd=4774001137463&vrn=4774001137457&prver=0&pronetvd=null&region=77&sub_region=77&type=427&vibid=4774001137463

The link being used for the download reads "Версия для печати" for those who don't know Russian.

I can't simply stop the function when the dialog begins popping up and pick up where I left off, because it's part of a larger function that scrapes links from drop-down menus that lead to the sites from the download link. This would also be extremely annoying, as there are 400+ files to download.

Is there some way I can alter the Chrome profile or my scraping function to prevent the system dialog from popping up every 101 files? Or is there a better way altogether to get these files downloaded?

Upvotes: 1

Views: 632

Answers (1)

hrbrmstr
hrbrmstr

Reputation: 78842

No need for Selenium:

library(httr)

httr::GET(
  url = "http://www.moscow_city.vybory.izbirkom.ru/servlet/ExcelReportVersion",
  query = list(
    region="77",
    sub_region="77",
    root="774001001",
    global="null",
    vrn="4774001137457",
    tvd="4774001137463",
    type="427",
    vibid="4774001137463",
    condition="",
    action="show",
    version="null",
    prver="0",
    sortorder="0"
  ),
  write_disk("/tmp/report.xls"), ## CHANGE ME
  verbose()
) -> res

I save it off to an object so you can run warn_for_status() or other such checks.

It shld be straightforward to wrap that in a function with parameters to make it more generic.

Upvotes: 1

Related Questions