Reputation: 11
I need to download ~50 CSV files in python. Based on the Google Chrome network stats, the download takes only 0.1 seconds, while the request takes about 7 seconds to process.
I am currently using headless Chrome to make the requests. I tried multithreading, but from what I can tell, the browser doesn't support that (it can't make another request before the first request finishes processing). I don't think Multiprocessing is an option as this script will be hosted on a virtual server.
My next idea is to use the requests module instead of headless Chrome, but I am having issues connecting to the company network without a browser. Will this work, though? Any other solutions? Could I do something with multiple driver instances or multiple tabs on a single driver?Thanks!
Here's my code:
from Multiprocessing.pool import ThreadPool
driver=ChromeDriver()
Login(driver)
def getFile(item):
driver.get(url.format(item))
updateSet=blah
pool= ThreadPool(len(updateSet))
for item in updateSet:
pool.apply_async(getFile,(item,))
pool.close()
pool.join()
Upvotes: 1
Views: 144
Reputation: 205
For request maybe try setting the user agent string to a browser like Chrome, ex: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36.
Some example code:
import requests
url = 'SOME URL'
headers = {
'User-Agent': 'user agent here',
'From': '[email protected]' # This is another valid field
}
response = requests.get(url, headers=headers)
Upvotes: 1