ferada
ferada

Reputation: 278

Process hangs on urllib2 socket reset

We have a server program which occasionally hangs in a read call on a urllib2 socket when getting a connection reset, like so:

Traceback (most recent call last):
  File "run.py", line 112, in fetch_stuff
    raw = response.read()
  File "/usr/lib/python2.7/socket.py", line 351, in read
    data = self._sock.recv(rbufsize)
  File "/usr/lib/python2.7/httplib.py", line 573, in read
    s = self.fp.read(amt)
  File "/usr/lib/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
error: [Errno 104] Connection reset by peer

Edit: With hang I mean the program doesn't crash and is still active a couple of hours later, however, it seems that it's still stuck after having printed that one error message.

However, AFAIK the code handles outside of the library handles exceptions correctly:

for i in range(retries):
    try:
        response = urllib2.urlopen(url)
        raw = response.read() # fails here
        ...
    except urllib2.HTTPError as e:
        logging.error("HTTP Error for url=%s (code=%s, message=%s, headers=%s)" % (url, e.code, e.msg, e.hdrs))
    except Exception as e:
        logging.exception(e)
else:
    logging.error(('Connection failed after {} tries').format(retries))
    sys.exit(0)

I can't see why this would hang the entire process with no further progress. We're now trying to set the timeout parameter to urlopen, but I'm having my doubts that that will fix the issue.

So, since I've found no useful links thus far (except maybe this answer), is there an (obvious) fix for this, should we use another library, ...?

Also, what actually happens? I get that the connection is reset, but what happens next?

Upvotes: 3

Views: 1331

Answers (1)

Marco Guerri
Marco Guerri

Reputation: 942

The read call is blocking unless you are working on a non-blocking socket. Therefore, your process is blocked on the read() call.

For some reason, the other side of the connection sends a packet with the RST flag set, closing the connection. When the OS detects this event, the recv system call returns with ECONNRESET, defined in linux/include/errno.h and corresponding to error code 104.

Python translates the error code with the errno module (https://docs.python.org/2/library/errno.html#module-errno) and raises an Exception. Error code 104 is, as expected, errno.ECONNRESET:

>>> import errno
>>> print errno.ECONNRESET
104

You are then catching that exception and calling

logging.exception(e)

which prints the stack trace. Afterwards, either you keep on looping or you follow the else branch. Given your output, it is not clear to me what happens.

This can be easily reproduced. Very simple client code:

import urllib2
import logging

r = urllib2.urlopen("http://localhost:8080")
try:
   print "Reading!" 
   r.read()
except Exception as e:
    logging.exception(e)

On the server side, directly from the command line:

➜  ~ [1] at 22:50:53 [Wed 12] $ nc -l -p 8080

Once the connection is established, the client blocks on the read call. tcpkill can be used to kill the connection with a RST flag once some traffic is detected:

~ [1] at 22:51:19 [Wed 12] $ sudo tcpkill -i lo port 8080

And, as expected, the result on the client side is:

➜  ~ [1] at 23:12:37 [Wed 12] $ python m.py
Reading!
ERROR:root:[Errno 104] Connection reset by peer
Traceback (most recent call last):
  File "m.py", line 7, in <module>
    r.read()
  File "/usr/lib/python2.7/socket.py", line 351, in read
    data = self._sock.recv(rbufsize)
  File "/usr/lib/python2.7/httplib.py", line 561, in read
    s = self.fp.read(amt)
  File "/usr/lib/python2.7/httplib.py", line 1302, in read
    return s + self._file.read(amt - len(s))
  File "/usr/lib/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
error: [Errno 104] Connection reset by peer

Adding a timeout would not solve much. If your connection is reset while your process is blocked on the read call (even if with a timeout) the outcome will be exactly the same. I think you should first of all try to understand why the connection is being reset. But reading on a socket which has been closed with a RST flag is an event that you can't avoid and you should handle.

Upvotes: 3

Related Questions