Reputation: 421
I'm using python and selenium to scrape a website full of text files (as URL's), and then using requests to get those txt files.
The code I'm using is as follows:
r = requests.get(link,cookies=cookies)
# Checking for a successful connection to the server.
if r.status_code == 200:
print("Downloading data for time %d, altitude %d" %(counter1, altitude) )
data = r.text # Extracting the text from the file online
file_name = os.path.join(path,fileName)
with open(file_name, 'w') as w:
w.write(data)
w.closed
# Closing browser
browser.close()
There are about 900 odd files to be downloaded, but after every 250 odd downloads/reqests, the script terminates with an error
OSError. [Errno 24] Too many files open.
I've made sure the the file that was being written to is closed. Same goes for selenium, after each text file is downloaded, the chromedriver closes, and the loop moves onto the next URL. Has anyone else encountered this, if so, what did you do to fix it?
Upvotes: 1
Views: 7208
Reputation: 421
Thanks for suggestions.
I just realized that browser.close() closes the window but does not quit the instance of chromedriver. Since the initilization of the chromedriver was within the loop of extracting the data file, the script kept opening new instances of chromedriver, eventually overloading my memory with over 200 instances.
The simple fix to this is use webdriver.quit(), which will completely quit the instance of the webdriver.
Better still, instead of creating a new instance at the beginning of every loop iteration, at the end of the loop just use webdriver.get(URL), which will redirect the current instance to the target URL.
Upvotes: 4