Reputation: 278
We have a server program which occasionally hangs in a read
call on a
urllib2
socket when getting a connection reset, like so:
Traceback (most recent call last):
File "run.py", line 112, in fetch_stuff
raw = response.read()
File "/usr/lib/python2.7/socket.py", line 351, in read
data = self._sock.recv(rbufsize)
File "/usr/lib/python2.7/httplib.py", line 573, in read
s = self.fp.read(amt)
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
error: [Errno 104] Connection reset by peer
Edit: With hang I mean the program doesn't crash and is still active a couple of hours later, however, it seems that it's still stuck after having printed that one error message.
However, AFAIK the code handles outside of the library handles exceptions correctly:
for i in range(retries):
try:
response = urllib2.urlopen(url)
raw = response.read() # fails here
...
except urllib2.HTTPError as e:
logging.error("HTTP Error for url=%s (code=%s, message=%s, headers=%s)" % (url, e.code, e.msg, e.hdrs))
except Exception as e:
logging.exception(e)
else:
logging.error(('Connection failed after {} tries').format(retries))
sys.exit(0)
I can't see why this would hang the entire process with no further
progress. We're now trying to set the timeout
parameter to urlopen
,
but I'm having my doubts that that will fix the issue.
So, since I've found no useful links thus far (except maybe this answer), is there an (obvious) fix for this, should we use another library, ...?
Also, what actually happens? I get that the connection is reset, but what happens next?
Upvotes: 3
Views: 1331
Reputation: 942
The read call is blocking unless you are working on a non-blocking socket. Therefore, your process is blocked on the read() call.
For some reason, the other side of the connection sends a packet with the RST flag set, closing the connection. When the OS detects this event, the recv system call returns with ECONNRESET, defined in linux/include/errno.h and corresponding to error code 104.
Python translates the error code with the errno module (https://docs.python.org/2/library/errno.html#module-errno) and raises an Exception. Error code 104 is, as expected, errno.ECONNRESET:
>>> import errno
>>> print errno.ECONNRESET
104
You are then catching that exception and calling
logging.exception(e)
which prints the stack trace. Afterwards, either you keep on looping or you follow the else branch. Given your output, it is not clear to me what happens.
This can be easily reproduced. Very simple client code:
import urllib2
import logging
r = urllib2.urlopen("http://localhost:8080")
try:
print "Reading!"
r.read()
except Exception as e:
logging.exception(e)
On the server side, directly from the command line:
➜ ~ [1] at 22:50:53 [Wed 12] $ nc -l -p 8080
Once the connection is established, the client blocks on the read call. tcpkill can be used to kill the connection with a RST flag once some traffic is detected:
~ [1] at 22:51:19 [Wed 12] $ sudo tcpkill -i lo port 8080
And, as expected, the result on the client side is:
➜ ~ [1] at 23:12:37 [Wed 12] $ python m.py
Reading!
ERROR:root:[Errno 104] Connection reset by peer
Traceback (most recent call last):
File "m.py", line 7, in <module>
r.read()
File "/usr/lib/python2.7/socket.py", line 351, in read
data = self._sock.recv(rbufsize)
File "/usr/lib/python2.7/httplib.py", line 561, in read
s = self.fp.read(amt)
File "/usr/lib/python2.7/httplib.py", line 1302, in read
return s + self._file.read(amt - len(s))
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
error: [Errno 104] Connection reset by peer
Adding a timeout would not solve much. If your connection is reset while your process is blocked on the read call (even if with a timeout) the outcome will be exactly the same. I think you should first of all try to understand why the connection is being reset. But reading on a socket which has been closed with a RST flag is an event that you can't avoid and you should handle.
Upvotes: 3