Reputation: 55
I've been trying to download all the files on this page (https://apps.fs.usda.gov/fia/datamart/datamart_excel.html) in bulk , but am having some issues.
All the filenames are the '{state abbreviations}.xlsm', so I can download a single file using requests using code like this:
import requests
url = 'https://apps.fs.usda.gov/fia/datamart/Workbooks/WA.xlsm'
r = requests.get(url)
with open('WA.xlsm', 'wb') as f:
f.write(r.content)
I believe there should be a way to incorporate this into a for loop to get all of the files, but I'm at a loss. Any advice?
Thanks!
Upvotes: 0
Views: 1101
Reputation: 11612
Just to add on to @balderman asnwer, but if you have multiple states to get, might be slightly more efficient to use a threading approach. straightforward example using concurrent.futures
:
from concurrent.futures import ThreadPoolExecutor
from pathlib import Path
from time import time
import requests
states = ['WA', 'CA', 'VA', 'NC'] # TODO add more states
out_dir = Path('temp_files')
out_dir.mkdir(exist_ok=True)
def get_content(state: str) -> bytes:
url = f'https://apps.fs.usda.gov/fia/datamart/Workbooks/{state}.xlsm'
r = requests.get(url)
return r.content
start = time()
with ThreadPoolExecutor(max_workers=max(10, len(states))) as pool:
for state, content in zip(states, pool.map(get_content, states)):
with open(out_dir / f'{state}.xlsm', 'wb') as f:
f.write(content)
print('Download ThreadExecutor took', time()-start)
# Compare times with below
# start = time()
# for state in states:
# b = get_content(state)
# with open(out_dir / f'{state}.xlsm', 'wb') as f:
# f.write(b)
# print('Download took', time()-start)
Upvotes: 1
Reputation: 23815
Try the below
import requests
states = ['WA','CA'] # TODO add more states
for state in states:
url = f'https://apps.fs.usda.gov/fia/datamart/Workbooks/{state}.xlsm'
r = requests.get(url)
with open(f'{state}.xlsm', 'wb') as f:
f.write(r.content)
Upvotes: 1