David Scott
David Scott

Reputation: 886

Catching ConnectionResetError with Python

I'm building a Python script that searches through my database for all URLs and then follows the URLs to find broken links. This script requires using exception handling to log when it encounters an error opening a link, however it's started encountering an error that I've been completely unable to write an except statement for:

Traceback (most recent call last):
  File "exceptionerror.py", line 97, in <module>
    raw_response = response.read().decode('utf8', errors='ignore')
  File "/usr/lib/python3.4/http/client.py", line 512, in read
    s = self._safe_read(self.length)
  File "/usr/lib/python3.4/http/client.py", line 662, in _safe_read
    chunk = self.fp.read(min(amt, MAXAMOUNT))
  File "/usr/lib/python3.4/socket.py", line 371, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer

I've tried the following:

except SocketError as inst:
    brokenlinksflag = 1
    brokenlinks = articlelinks[j] + ' ' + sys.exc_info()[0] + ', ' + brokenlinks
    continue

And:

except ConnectionResetError as inst:
    brokenlinksflag = 1
    brokenlinks = articlelinks[j] + ' ' + sys.exc_info()[0] + ', ' + brokenlinks
    continue

And even a full generic exception to attempt to catch all errors just so it doesn't kill the whole script:

except:
    print("This link was not caught by defined exceptions: " + articlelinks[j])
    continue

I'm at a complete loss for how to have my script catch this error so that it can continue checking for broken links rather than hard failing. It's intermittent, so I do not believe the link is broken, and I feel that even though I've identified the URL, simply catching it and skipping it before hand is cheating since my goal is to properly handle exceptions. Could someone advise me on how to handle this exception?

For reference, here is my full loop:

for j in range(0, len(articlelinks)):
    try:
        req=urllib.request.Request(articlelinks[j], None, {'User-agent' : 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'})
        response = urllib.request.urlopen(req)
    except urllib.request.HTTPError as inst:
        brokenlinksflag = 1
        brokenlinks = articlelinks[j] + ' ' + format(inst) + ', ' + brokenlinks
        continue
    except TimeoutError:
        brokenlinksflag = 1
        brokenlinks = articlelinks[j] + ' Timeout Error, ' + brokenlinks
        continue
    except urllib.error.URLError as inst:
        brokenlinksflag = 1
        brokenlinks = articlelinks[j] + ' ' + format(inst) + ', ' + brokenlinks
        continue
    except SocketError as inst:
        brokenlinksflag = 1
        brokenlinks = articlelinks[j] + ' ' + sys.exc_info()[0] + ', ' + brokenlinks
        continue
    except:
        print("This article killed everything: " + articlelinks[j])
        exit()

Upvotes: 15

Views: 29736

Answers (1)

David Scott
David Scott

Reputation: 886

Solved! The issue is that that I was troubleshooting the connection to handle the ConnectionResetError, however, more careful examination of the full error indicated that the error was thrown by trying to process the response rather than opening the url:

  File "exceptionerror.py", line 97, in <module>
    raw_response = response.read().decode('utf8', errors='ignore')

Because the connection was reset, rather than completely terminated, the script was able to successfully open the URL, and the error was generated when trying to decode the response, meaning that the try/except conditions were around the wrong lines.

The following resolved the issue:

try:
    raw_response = response.read().decode('utf8', errors='ignore')
except ConnectionResetError:
    brokenlinksflag = 1
    brokenlinks = articlelinks[j] + ' ConnectionResetError, ' + brokenlinks
    continue

Upvotes: 13

Related Questions