Reputation: 275
I'm using the requests library to get a lot of webpages from somewhere. He's the pertinent code:
response = requests.Session()
retries = Retry(total=5, backoff_factor=.1)
response.mount('http://', HTTPAdapter(max_retries=retries))
response = response.get(url)
After a while it just hangs/freezes (never on the same webpage) while getting the page. Here's the traceback when I interrupt it:
File "/Users/Student/Hockey/Scrape/html_pbp.py", line 21, in get_pbp
response = r.read().decode('utf-8')
File "/anaconda/lib/python3.6/http/client.py", line 456, in read
return self._readall_chunked()
File "/anaconda/lib/python3.6/http/client.py", line 566, in _readall_chunked
value.append(self._safe_read(chunk_left))
File "/anaconda/lib/python3.6/http/client.py", line 612, in _safe_read
chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/anaconda/lib/python3.6/socket.py", line 586, in readinto
return self._sock.recv_into(b)
KeyboardInterrupt
Does anybody know what could be causing it? Or (more importantly) does anybody know a way to stop it if it takes more than a certain amount of time so that I could try again?
Upvotes: 22
Views: 48169
Reputation: 12877
Patching the documented "send" function will fix this for all requests - even in many dependent libraries and sdk's. When patching libs, be sure to patch supported/documented functions, otherwise you may wind up silently losing the effect of your patch.
import requests
DEFAULT_TIMEOUT = 180
old_send = requests.Session.send
def new_send(*args, **kwargs):
if kwargs.get("timeout", None) is None:
kwargs["timeout"] = DEFAULT_TIMEOUT
return old_send(*args, **kwargs)
requests.Session.send = new_send
The effects of not having any timeout are quite severe, and the use of a default timeout can almost never break anything - because TCP itself has timeouts as well.
On Windows the default TCP timeout is 240 seconds, TCP RFC recommend a minimum of 100 seconds for RTO*retry. Somewhere in that range is a safe default.
Upvotes: 4
Reputation: 9845
To set timeout globally instead of specifying in every request:
from requests.adapters import TimeoutSauce
REQUESTS_TIMEOUT_SECONDS = float(os.getenv("REQUESTS_TIMEOUT_SECONDS", 5))
class CustomTimeout(TimeoutSauce):
def __init__(self, *args, **kwargs):
if kwargs["connect"] is None:
kwargs["connect"] = REQUESTS_TIMEOUT_SECONDS
if kwargs["read"] is None:
kwargs["read"] = REQUESTS_TIMEOUT_SECONDS
super().__init__(*args, **kwargs)
# Set it globally, instead of specifying ``timeout=..`` kwarg on each call.
requests.adapters.TimeoutSauce = CustomTimeout
sess = requests.Session()
sess.get(...)
sess.post(...)
Upvotes: 1
Reputation: 18687
Seems like setting a (read) timeout might help you.
Something along the lines of:
response = response.get(url, timeout=5)
(This will set both connect and read timeout to 5 seconds.)
In requests
, unfortunately, neither connect nor read timeouts are set by default, even though the docs say it's good to set it:
Most requests to external servers should have a timeout attached, in case the server is not responding in a timely manner. By default, requests do not time out unless a timeout value is set explicitly. Without a timeout, your code may hang for minutes or more.
Just for completeness, the connect timeout is the number of seconds requests
will wait for your client to establish a connection to a remote machine, and the read timeout is the number of seconds the client will wait between bytes sent from the server.
Upvotes: 36