Dom
Dom

Reputation: 1053

Downloading multiple files using purrr

I am trying to download all of the Excel files at: https://www.grants.gov.au/reports/gaweeklyexport

Using the code below I get errors similar to the following for each link (77 in total):

[[1]]$error
<simpleError in download.file(.x, .y, mode = "wb"): scheme not supported in URL '/Reports/GaWeeklyExportDownload?GaWeeklyExportUuid=0db183a2-11c6-42f8-bf52-379aafe0d21b'>

I get this error when trying to iterate over the full list, but when I call download.file on an individual list item it works fine.

I would be grateful if someone could tell me what I have done wrong or suggest a better way of doing it.

The code that produces the error:


library(tidyverse)
library(rvest)

# Reading links to the Excel files to be downloaded
url <- "https://www.grants.gov.au/reports/gaweeklyexport"

webpage <- read_html(url)

# The list of links to the Excel files
links <- html_attr(html_nodes(webpage, '.u'), "href")

# Creating names for the files to supply to the download.file function
wb_names = str_c(1:77, ".xlsx")

# Defining a function that using purrr's safely to ensure it doesn't fail if there is a dead link
safe_download <- safely(~ download.file(.x , .y, mode = "wb"))

# Combining links, file names, and the function returns an error
map2(links, wb_names, safe_download)

Upvotes: 0

Views: 485

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389055

You need to prepend 'https://www.grants.gov.au/' to the URL to get the absolute path of the file which can be used to download the file.

library(rvest)
library(purrr)

url <- "https://www.grants.gov.au/reports/gaweeklyexport"

webpage <- read_html(url)
# The list of links to the Excel files
links <- paste0('https://www.grants.gov.au/', html_attr(html_nodes(webpage, '.u'), "href"))

safe_download <- safely(~ download.file(.x , .y, mode = "wb"))

# Creating names for the files to supply to the download.file function
wb_names = paste0(1:77, ".xlsx")
map2(links, wb_names, safe_download)

Upvotes: 1

Related Questions