Evan Hagen
Evan Hagen

Reputation: 11

How can I download many small files quickly? (Not Bandwidth limited)

I need to download ~50 CSV files in python. Based on the Google Chrome network stats, the download takes only 0.1 seconds, while the request takes about 7 seconds to process.

I am currently using headless Chrome to make the requests. I tried multithreading, but from what I can tell, the browser doesn't support that (it can't make another request before the first request finishes processing). I don't think Multiprocessing is an option as this script will be hosted on a virtual server.

My next idea is to use the requests module instead of headless Chrome, but I am having issues connecting to the company network without a browser. Will this work, though? Any other solutions? Could I do something with multiple driver instances or multiple tabs on a single driver?Thanks!

Here's my code:

from Multiprocessing.pool import ThreadPool
driver=ChromeDriver()
Login(driver)

def getFile(item):
    driver.get(url.format(item))

updateSet=blah
pool= ThreadPool(len(updateSet))
for item in updateSet:
    pool.apply_async(getFile,(item,))

pool.close()
pool.join()

Upvotes: 1

Views: 144

Answers (1)

pauliesnug
pauliesnug

Reputation: 205

For request maybe try setting the user agent string to a browser like Chrome, ex: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36.

Some example code:

import requests

url = 'SOME URL'

headers = {
    'User-Agent': 'user agent here',
    'From': '[email protected]'  # This is another valid field
}

response = requests.get(url, headers=headers)

Upvotes: 1

Related Questions