Reputation: 1854
When requesting this URL http://www.trouverlesmots.com, this header
is received back:
{'headers': HTTPHeaderDict({'Date': 'Wed, 20 Nov 2019 18:40:39 GMT', 'Server': 'Apache/2.4.41 (Unix)', 'X-Powered-By': 'PHP/7.1.33', 'Expires': 'Wed, 11 Jan 1984 05:00:00 GMT', 'Cache-Control': 'no-cache, must-revalidate, max-age=0', 'Retry-After': '86400', 'Vary': 'User-Agent', 'Connection': 'close', 'Transfer-Encoding': 'chunked', 'Content-Type': 'text/html; charset=UTF-8'}), 'status': 503, 'version': 11, 'reason': 'Service Temporarily Unavailable', 'strict': 0, 'decode_content': False, 'retries': Retry(total=2, connect=None, read=None, redirect=None, status=None), 'enforce_content_length': False, 'auto_close': True, '_decoder': None, '_body': None, '_fp': <http.client.HTTPResponse object at 0x7f2588117940>, '_original_response': <http.client.HTTPResponse object at 0x7f2588117940>, '_fp_bytes_read': 7482, 'msg': None, '_request_url': None, '_pool': <urllib3.connectionpool.HTTPConnectionPool object at 0x7f2588117e10>, '_connection': None, 'chunked': True, 'chunk_left': None, 'length_remaining': None}
Two parameters are implied:
status_code: 503
which implies a retries
processretry_after: 86400
retry_after
is set to 86400
so my requests.Session()
is pausing for one entire day.
Here is the piece of code commited:
def sleep_for_retry(self, response=None):
retry_after = self.get_retry_after(response)
if retry_after:
time.sleep(retry_after) # stops here
return True
return False
From urllib3.util.retry.py:277
.
respect_retry_after_header
may be changed to do not respect the retry_after
parameter, in the __init__
of the Retry
object.
def __init__(
self,
total=10,
connect=None,
read=None,
redirect=None,
status=None,
method_whitelist=DEFAULT_METHOD_WHITELIST,
status_forcelist=None,
backoff_factor=0,
raise_on_redirect=True,
raise_on_status=True,
history=None,
respect_retry_after_header=True,
remove_headers_on_redirect=DEFAULT_REDIRECT_HEADERS_BLACKLIST,
)
From urllib3.util.retry.py:174
.
Do you know how override that respect_retry_after
parameter, from my requests.Session()
?
Upvotes: 2
Views: 2032
Reputation: 4771
While this answer is likely to work, the documented way to control retries is to pass a urllib3 Retry
object to a requests HTTPAdapter
and mount that adapter on a Session
object. It works like this:
import urllib3
import requests
import requests.adapters
retry = urllib3.Retry(respect_retry_after_header=False)
adapter = requests.adapters.HTTPAdapter(max_retries=retry)
session = requests.Session()
session.mount("http://", adapter)
r = session.get("http://www.trouverlesmots.com")
print(r.status_code, r.headers)
Upvotes: 4
Reputation: 106891
Since sleep_for_retry
calls get_retry_after
, which calls parse_retry_after
to parse the Retry-After
header value, you can override parse_retry_after
with a wrapper function that caps its return value with the min
function (the example below caps it at 10 seconds):
from urllib3.util.retry import Retry
orig_parse_retry_after = Retry.parse_retry_after
Retry.parse_retry_after = lambda self, retry_after: min(10, orig_parse_retry_after(self, retry_after))
Upvotes: 0