Hobbit36
Hobbit36

Reputation: 275

Python Requests hanging/freezing

I'm using the requests library to get a lot of webpages from somewhere. He's the pertinent code:

response = requests.Session()
retries = Retry(total=5, backoff_factor=.1)
response.mount('http://', HTTPAdapter(max_retries=retries))
response = response.get(url)

After a while it just hangs/freezes (never on the same webpage) while getting the page. Here's the traceback when I interrupt it:

File "/Users/Student/Hockey/Scrape/html_pbp.py", line 21, in get_pbp
  response = r.read().decode('utf-8')
File "/anaconda/lib/python3.6/http/client.py", line 456, in read
  return self._readall_chunked()
File "/anaconda/lib/python3.6/http/client.py", line 566, in _readall_chunked
  value.append(self._safe_read(chunk_left))
File "/anaconda/lib/python3.6/http/client.py", line 612, in _safe_read
  chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/anaconda/lib/python3.6/socket.py", line 586, in readinto
  return self._sock.recv_into(b)
KeyboardInterrupt

Does anybody know what could be causing it? Or (more importantly) does anybody know a way to stop it if it takes more than a certain amount of time so that I could try again?

Upvotes: 22

Views: 48169

Answers (3)

Erik Aronesty
Erik Aronesty

Reputation: 12877

Patching the documented "send" function will fix this for all requests - even in many dependent libraries and sdk's. When patching libs, be sure to patch supported/documented functions, otherwise you may wind up silently losing the effect of your patch.

import requests

DEFAULT_TIMEOUT = 180

old_send = requests.Session.send

def new_send(*args, **kwargs):
     if kwargs.get("timeout", None) is None:
         kwargs["timeout"] = DEFAULT_TIMEOUT
     return old_send(*args, **kwargs)

requests.Session.send = new_send

The effects of not having any timeout are quite severe, and the use of a default timeout can almost never break anything - because TCP itself has timeouts as well.

On Windows the default TCP timeout is 240 seconds, TCP RFC recommend a minimum of 100 seconds for RTO*retry. Somewhere in that range is a safe default.

Upvotes: 4

Danila Vershinin
Danila Vershinin

Reputation: 9845

To set timeout globally instead of specifying in every request:


from requests.adapters import TimeoutSauce

REQUESTS_TIMEOUT_SECONDS = float(os.getenv("REQUESTS_TIMEOUT_SECONDS", 5))

class CustomTimeout(TimeoutSauce):
    def __init__(self, *args, **kwargs):
        if kwargs["connect"] is None:
            kwargs["connect"] = REQUESTS_TIMEOUT_SECONDS
        if kwargs["read"] is None:
            kwargs["read"] = REQUESTS_TIMEOUT_SECONDS
        super().__init__(*args, **kwargs)


# Set it globally, instead of specifying ``timeout=..`` kwarg on each call.
requests.adapters.TimeoutSauce = CustomTimeout


sess = requests.Session()
sess.get(...)
sess.post(...)

Upvotes: 1

randomir
randomir

Reputation: 18687

Seems like setting a (read) timeout might help you.

Something along the lines of:

response = response.get(url, timeout=5)

(This will set both connect and read timeout to 5 seconds.)

In requests, unfortunately, neither connect nor read timeouts are set by default, even though the docs say it's good to set it:

Most requests to external servers should have a timeout attached, in case the server is not responding in a timely manner. By default, requests do not time out unless a timeout value is set explicitly. Without a timeout, your code may hang for minutes or more.

Just for completeness, the connect timeout is the number of seconds requests will wait for your client to establish a connection to a remote machine, and the read timeout is the number of seconds the client will wait between bytes sent from the server.

Upvotes: 36

Related Questions