Reputation: 1170
I have a large file containing thousands of links. I've written a script calling each link line-by-line and performing various analyses on the respective webpage. However, sometimes it is the case that the link is faulty (article removed from website, etc), and my whole script just stops at that point.
Is there a way to circumvent this problem? Here's my (pseudo)code:
for row in file:
url = row[4]
req=urllib2.Request(url)
tree = lxml.html.fromstring(urllib2.urlopen(req).read())
perform analyses
append analyses results to lists
output data
I have tried
except:
pass
But it royally messes up the script for some reason.
Upvotes: 0
Views: 38
Reputation: 892
Try block is the way to go:
for row in file:
url = row[4]
try:
req=urllib2.Request(url)
tree = lxml.html.fromstring(urllib2.urlopen(req).read())
except URLError, e:
continue
perform analyses
append analyses results to lists
output data
Continue will allow you to skip any unnecessary computation after the url check and restart at the next iteration of the loop
Upvotes: 0
Reputation: 2104
Works for me:
for row in file:
url = row[4]
try:
req=urllib2.Request(url)
tree = lxml.html.fromstring(urllib2.urlopen(req).read())
perform analyses
append analyses results to lists
except URLError, e:
pass
output data
Upvotes: 2