Reputation: 1072
I am scraping a website which has 2 versions at the moment, and when you visit the site you never know which one you are going to get. For this reason I have had to set up two separate files to scrape it.
For the sake of simplicity I have a master file which controls the running of the two files:
attempts = 0
while attempts < 10:
try:
try:
runfile('file1.py')
except SomeException:
runfile('file2.py')
break
except:
attempts += 1
So basically this keeps trying a maximum of 10 times until the correct version of the site meets the correct scraper file.
The problem with this is that the files launch a webdriver
every time, so I can end up with several empty browsers clogging up the machine. Is there any command which can just close all webdriver instances? I cannot use driver.quit()
because in the environment of this umbrella script, driver
is not a recognized variable.
I also cannot use driver.quit()
at the end of file1.py
or file2.py
because when file1.py
encounters an error, it ceases to run and so the driver.quit()
command will not be executed. I can't use a try / except
because then my master file won't understand that there was an error in file1.py
and thus won't run file2.py
.
Upvotes: 0
Views: 122
Reputation: 38922
Handle the exception in individual runners, close the driver and raise a common exception that you then handle in the caller.
In file1.py and file2.py
try:
# routine
except Exception as e:
driver.quit()
raise e
You can factor this out to the caller by initializing the driver in the caller, and passing the driver instance to functions instead of modules.
Upvotes: 1
Reputation: 308789
You can have a try..finally block in runfile
.
def runfile(filename):
driver = ...
try:
...
finally:
# close the webdriver
driver.quit()
Upvotes: 1