questionMarc
questionMarc

Reputation: 67

Downloading multiple files in R with variable length, nested URLs

New member here. Trying to download a large number of files from a website in R (but open to suggestions as well, such as wget.)

From this post, I understand I must create a vector with the desired URLs. My initial problem is to write this vector, since I have 27 states and 34 agencies within each state. I must download one file for each agency for all states. Whereas the state codes are always two characters, the agency codes are 2 to 7 characters long. The URLs would look like this:

http://website.gov/xx_yyyyyyy.zip

where xxis the state code and yyyyyyy the agency code, between 2 and 7 characters long. I am lost as to how to build one such loop.

I assume I can then download this url list with the following function:

for(i in 1:length(url)){
download.file(urls, destinations, mode="wb")}

Does that make sense?

(Disclaimer: an earlier version of this post was uploaded earlier but incomplete. My mistake, sorry!)

Upvotes: 3

Views: 2443

Answers (3)

DataJack
DataJack

Reputation: 405

If all your agency codes are the same within each state code you could use the below to create your vector of urls to loop through. (You will also need a vector of destinations the same size).

#Getting all combinations
States <- c("AA","BB")
Agency <- c("ABCDEFG","HIJKLMN")
AllCombinations <- expand.grid(States, Agency)
AllCombinationsVec <- paste0("http://website.gov/" ,AllCombinations$Var1, "_",AllCombinations$Var2,".zip" )

You can then try looping through each file something like this:

#loop method

for(i in seq(AllCombinationsVec)){
  download.file(AllCombinationsVec[i], destinations[i], mode="wb")}

This is also another way of looping through items apply functions will apply a function to every item in a list or vector.

#lapply method

mapply(function(x, y) download.file(x,y, mode="wb"),x = AllCombinationsVec, y = destinations)

Upvotes: 0

hrbrmstr
hrbrmstr

Reputation: 78792

This will download them in batches and take advantage of the speedier simultaneous downloading capabilities of download.file() if the libcurl option is available on your installation of R:

library(purrr)

states <- state.abb[1:27]
agencies <- c("AID", "AMBC", "AMTRAK", "APHIS", "ATF", "BBG", "DOJ", "DOT",
              "BIA", "BLM", "BOP", "CBFO", "CBP", "CCR", "CEQ", "CFTC", "CIA",
              "CIS", "CMS", "CNS", "CO", "CPSC", "CRIM", "CRT", "CSB", "CSOSA",
              "DA", "DEA", "DHS", "DIA", "DNFSB", "DOC", "DOD", "DOE", "DOI")

walk(states, function(x) {
   map(x, ~sprintf("http://website.gov/%s_%s.zip", ., agencies)) %>% 
    flatten_chr() -> urls
    download.file(urls, basename(urls), method="libcurl")
}) 

Upvotes: 7

epo3
epo3

Reputation: 3121

This should do the job:

agency <- c("FAA", "DEA", "NTSB")
states <- c("AL", "AK", "AZ", "AR")

URLs <-
paste0("http://website.gov/",
       rep(agency, length(agency)),
       "_",
       rep(states, length(states)),
       ".zip")

Then loop through the URLs vector to pull the zip files. It will be faster if you use an apply function.

Upvotes: 1

Related Questions