Reputation: 6740
I'm running into a problem in that urllib2.urlopen
/requests.post
is very occasionally blocking forever on socket.recv
and never returning.
I'm trying to find out why this is happening and address that problem, but in the mean time I wondered if there was a way from preventing it blocking forever?
I already know about the timeout
optional argument for urllib2.urlopen
and socket.setdefaulttimeout
but unfortunately for my use case a timeout isn't a solution as I'm uploading files with POST any timeout value I use would risk interrupting a normal file upload.
I've also seen some solutions using signals, but this will have the same problem as using timeouts for me (and is also out the question because I'm not doing this from the main thread).
Is it possible to timeout only if no data has been sent/received through the socket for a certain amount of time perhaps? Or maybe there's some way I can use select / poll to prevent the deadlock / blocking that I'm experiencing?
If there is a solution using select / poll, how would I go about incorporating this into urllib2.urlopen
/requests.post
?
I also had the idea that if I could send file data through a write type of interface, so I'd control iterating over the file and sending chunks at a time I could probably have enough control to avoid the stalls. I'm not sure how to achieve it though so I asked the question: Upload a file with a file.write interface
UPDATE
It seems I've always had a misconception of the meaning of timeout
in python, it seems it is actually an idle timeout or read/write timeout (probably the first time I've disagreed with Guido). I always thought it was the max amount of time the response should return in - thank you @tomasz for pointing this out!!
But after adding timeout parameters (tested with both urllib2
and requests
) I've come across some really odd and subtle scenarios, possibly mac specific, where the timeout doesn't work correctly which I'm getting more and more inclined to believe is a bug. I'm going to continue to investigate and find out exactly what the issue is. Again thank you tomasz for your help with this!
Upvotes: 3
Views: 4035
Reputation: 13072
I believe you could get rid of the hanging states by tweaking your TCP settings on the OS level, but assuming your application is not going to work on a dedicated (and maintainable by you) machine you should seek more general solution.
You asked:
Is it possible to timeout only if no data has been sent/received through the socket for a certain amount of time perhaps
And this is exactly the behaviour that socket.settimeout
(or the one passed to urllib2
) would give you. In contrary to the timeout based on a SIGALRM (which would terminated even during a slow data transfer), the timeout passed to the socket would occur only if no data has been transmitted during the period defined. A call to socket.send
or socket.recv
should return a partial count if some, but not all data has been transmitted during the period and urllib2
would then use a subsequent call in order to transmit the remaining data.
Saying this, your POST call could be still terminated somewhere in the middle of the upload if it would be executed in more than one send
call and any (but not the first) would block and timed out without sending any data. You gave an impression it wouldn't be handled properly by your application, but I think it should, as it would be similar to a forceful termination of the process or simply a dropped connection.
Have you tested and confirmed that socket.settimeout
doesn't solve your problem? Or you just weren't sure how the behaviour is implemented? If the former is correct, please could you give some more details? I'm quite sure you're safe with just setting the timeout as python is simply using the low level BSD socket implementation where the behaviour is as described above. To give you some more references, take a look at setsockopt
man page and SO_RCVTIMEO
or SO_SNDTIMEO
options. I'd expect socket.settimeout
to use exactly these function and options.
--- EDIT --- (to provide some test code)
So I was able to get the Requests
module and test the behaviour along with urllib2
. I've run the server which was receiving blocks of data with increasing intervals between every recv
call. As expected, the client timed out when the interval reached the specified timeout. Example:
Server
import socket
import time
listener = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listener.bind(("localhost", 12346))
listener.listen(1)
sock,_ = listener.accept()
interval = 0.5
while 1:
interval += 1 # increase interval by 1 second
time.sleep(interval)
# Get 1MB but will be really limited by the buffer
data = sock.recv(1000000)
print interval, len(data)
if not data:
break
Client (Requests module)
import requests
data = "x"*100000000 # 100MB beefy chunk
requests.post("http://localhost:12346", data=data, timeout=4)
Client (urllib2 module)
import urllib2
data = "x"*100000000 # 100MB beefy chunk
urllib2.urlopen("http://localhost:12346", data=data, timeout=4)
Output (Server)
> 1.5 522832
> 2.5 645816
> 3.5 646180
> 4.5 637832 <--- Here the client dies (4.5 seconds without data transfer)
> 5.5 294444
> 6.5 0
Both clients raised an exception:
# urllib2
URLError: timeout('timed out',)
# Requests
Timeout: TimeoutError("HTTPConnectionPool(host='localhost', port=12346): Request timed out. (timeout=4)",)
Everything works as expected! If not passing a timeout as an argument, urllib2
also reacted well on socket.setdefaulttimeout
, however Requests
did not. It's not a surprise as internal implementation doesn't need to use the default value at all and could simply overwrite it depending on the passed argument or use non-blocking sockets.
I've been running this using the following:
OSX 10.8.3
Python 2.7.2
Requests 1.1.0
Upvotes: 6
Reputation: 8721
You mention that the indefinite blocking happens "very occasionally", and that you're looking for a fallback to avoid failing file uploads when this happens. In this case, I recommend using a timeout for your post calls, and retrying the post in case of timeouts. All this requires is a simple for loop, with a break if anything happens other than a timeout.
Of course, you should log a warning message when this happens, and monitor how often this happens. And you should try to find the underlying cause of the freezes (as you mentioned you intend to).
Upvotes: 1
Reputation: 612
One of possible decisions - you could nest your urllib2 request to a block with ALRM signal handling, or put it into a thread with forced stopping on timeout. This will force stopping your request by timeout , in spite of any internal urllib2 problem, old question about this case: Python: kill or terminate subprocess when timeout
Upvotes: 0