Reputation: 886
I'm building a Python script that searches through my database for all URLs and then follows the URLs to find broken links. This script requires using exception handling to log when it encounters an error opening a link, however it's started encountering an error that I've been completely unable to write an except statement for:
Traceback (most recent call last):
File "exceptionerror.py", line 97, in <module>
raw_response = response.read().decode('utf8', errors='ignore')
File "/usr/lib/python3.4/http/client.py", line 512, in read
s = self._safe_read(self.length)
File "/usr/lib/python3.4/http/client.py", line 662, in _safe_read
chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/usr/lib/python3.4/socket.py", line 371, in readinto
return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer
I've tried the following:
except SocketError as inst:
brokenlinksflag = 1
brokenlinks = articlelinks[j] + ' ' + sys.exc_info()[0] + ', ' + brokenlinks
continue
And:
except ConnectionResetError as inst:
brokenlinksflag = 1
brokenlinks = articlelinks[j] + ' ' + sys.exc_info()[0] + ', ' + brokenlinks
continue
And even a full generic exception to attempt to catch all errors just so it doesn't kill the whole script:
except:
print("This link was not caught by defined exceptions: " + articlelinks[j])
continue
I'm at a complete loss for how to have my script catch this error so that it can continue checking for broken links rather than hard failing. It's intermittent, so I do not believe the link is broken, and I feel that even though I've identified the URL, simply catching it and skipping it before hand is cheating since my goal is to properly handle exceptions. Could someone advise me on how to handle this exception?
For reference, here is my full loop:
for j in range(0, len(articlelinks)):
try:
req=urllib.request.Request(articlelinks[j], None, {'User-agent' : 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'})
response = urllib.request.urlopen(req)
except urllib.request.HTTPError as inst:
brokenlinksflag = 1
brokenlinks = articlelinks[j] + ' ' + format(inst) + ', ' + brokenlinks
continue
except TimeoutError:
brokenlinksflag = 1
brokenlinks = articlelinks[j] + ' Timeout Error, ' + brokenlinks
continue
except urllib.error.URLError as inst:
brokenlinksflag = 1
brokenlinks = articlelinks[j] + ' ' + format(inst) + ', ' + brokenlinks
continue
except SocketError as inst:
brokenlinksflag = 1
brokenlinks = articlelinks[j] + ' ' + sys.exc_info()[0] + ', ' + brokenlinks
continue
except:
print("This article killed everything: " + articlelinks[j])
exit()
Upvotes: 15
Views: 29736
Reputation: 886
Solved! The issue is that that I was troubleshooting the connection to handle the ConnectionResetError, however, more careful examination of the full error indicated that the error was thrown by trying to process the response rather than opening the url:
File "exceptionerror.py", line 97, in <module>
raw_response = response.read().decode('utf8', errors='ignore')
Because the connection was reset, rather than completely terminated, the script was able to successfully open the URL, and the error was generated when trying to decode the response, meaning that the try/except conditions were around the wrong lines.
The following resolved the issue:
try:
raw_response = response.read().decode('utf8', errors='ignore')
except ConnectionResetError:
brokenlinksflag = 1
brokenlinks = articlelinks[j] + ' ConnectionResetError, ' + brokenlinks
continue
Upvotes: 13