Reputation: 67
New member here. Trying to download a large number of files from a website in R (but open to suggestions as well, such as wget.)
From this post, I understand I must create a vector with the desired URLs. My initial problem is to write this vector, since I have 27 states and 34 agencies within each state. I must download one file for each agency for all states. Whereas the state codes are always two characters, the agency codes are 2 to 7 characters long. The URLs would look like this:
http://website.gov/xx_yyyyyyy.zip
where xx
is the state code and yyyyyyy
the agency code, between 2 and 7 characters long. I am lost as to how to build one such loop.
I assume I can then download this url list with the following function:
for(i in 1:length(url)){
download.file(urls, destinations, mode="wb")}
Does that make sense?
(Disclaimer: an earlier version of this post was uploaded earlier but incomplete. My mistake, sorry!)
Upvotes: 3
Views: 2443
Reputation: 405
If all your agency codes are the same within each state code you could use the below to create your vector of urls to loop through. (You will also need a vector of destinations the same size).
#Getting all combinations
States <- c("AA","BB")
Agency <- c("ABCDEFG","HIJKLMN")
AllCombinations <- expand.grid(States, Agency)
AllCombinationsVec <- paste0("http://website.gov/" ,AllCombinations$Var1, "_",AllCombinations$Var2,".zip" )
You can then try looping through each file something like this:
#loop method
for(i in seq(AllCombinationsVec)){
download.file(AllCombinationsVec[i], destinations[i], mode="wb")}
This is also another way of looping through items apply functions will apply a function to every item in a list or vector.
#lapply method
mapply(function(x, y) download.file(x,y, mode="wb"),x = AllCombinationsVec, y = destinations)
Upvotes: 0
Reputation: 78792
This will download them in batches and take advantage of the speedier simultaneous downloading capabilities of download.file()
if the libcurl
option is available on your installation of R:
library(purrr)
states <- state.abb[1:27]
agencies <- c("AID", "AMBC", "AMTRAK", "APHIS", "ATF", "BBG", "DOJ", "DOT",
"BIA", "BLM", "BOP", "CBFO", "CBP", "CCR", "CEQ", "CFTC", "CIA",
"CIS", "CMS", "CNS", "CO", "CPSC", "CRIM", "CRT", "CSB", "CSOSA",
"DA", "DEA", "DHS", "DIA", "DNFSB", "DOC", "DOD", "DOE", "DOI")
walk(states, function(x) {
map(x, ~sprintf("http://website.gov/%s_%s.zip", ., agencies)) %>%
flatten_chr() -> urls
download.file(urls, basename(urls), method="libcurl")
})
Upvotes: 7
Reputation: 3121
This should do the job:
agency <- c("FAA", "DEA", "NTSB")
states <- c("AL", "AK", "AZ", "AR")
URLs <-
paste0("http://website.gov/",
rep(agency, length(agency)),
"_",
rep(states, length(states)),
".zip")
Then loop through the URLs
vector to pull the zip files. It will be faster if you use an apply function.
Upvotes: 1