sashaboulouds
sashaboulouds

Reputation: 1854

How override respect_retry_after_header in urllib3 from requests?

When requesting this URL http://www.trouverlesmots.com, this header is received back:

{'headers': HTTPHeaderDict({'Date': 'Wed, 20 Nov 2019 18:40:39 GMT', 'Server': 'Apache/2.4.41 (Unix)', 'X-Powered-By': 'PHP/7.1.33', 'Expires': 'Wed, 11 Jan 1984 05:00:00 GMT', 'Cache-Control': 'no-cache, must-revalidate, max-age=0', 'Retry-After': '86400', 'Vary': 'User-Agent', 'Connection': 'close', 'Transfer-Encoding': 'chunked', 'Content-Type': 'text/html; charset=UTF-8'}), 'status': 503, 'version': 11, 'reason': 'Service Temporarily Unavailable', 'strict': 0, 'decode_content': False, 'retries': Retry(total=2, connect=None, read=None, redirect=None, status=None), 'enforce_content_length': False, 'auto_close': True, '_decoder': None, '_body': None, '_fp': <http.client.HTTPResponse object at 0x7f2588117940>, '_original_response': <http.client.HTTPResponse object at 0x7f2588117940>, '_fp_bytes_read': 7482, 'msg': None, '_request_url': None, '_pool': <urllib3.connectionpool.HTTPConnectionPool object at 0x7f2588117e10>, '_connection': None, 'chunked': True, 'chunk_left': None, 'length_remaining': None}

Two parameters are implied:

retry_after is set to 86400 so my requests.Session() is pausing for one entire day.

Here is the piece of code commited:

    def sleep_for_retry(self, response=None):
        retry_after = self.get_retry_after(response)
        if retry_after:
            time.sleep(retry_after)  # stops here
            return True

        return False

From urllib3.util.retry.py:277.

respect_retry_after_header may be changed to do not respect the retry_after parameter, in the __init__ of the Retry object.

    def __init__(
        self,
        total=10,
        connect=None,
        read=None,
        redirect=None,
        status=None,
        method_whitelist=DEFAULT_METHOD_WHITELIST,
        status_forcelist=None,
        backoff_factor=0,
        raise_on_redirect=True,
        raise_on_status=True,
        history=None,
        respect_retry_after_header=True,
        remove_headers_on_redirect=DEFAULT_REDIRECT_HEADERS_BLACKLIST,
    )

From urllib3.util.retry.py:174.

Do you know how override that respect_retry_after parameter, from my requests.Session() ?

Upvotes: 2

Views: 2032

Answers (2)

Quentin Pradet
Quentin Pradet

Reputation: 4771

While this answer is likely to work, the documented way to control retries is to pass a urllib3 Retry object to a requests HTTPAdapter and mount that adapter on a Session object. It works like this:

import urllib3
import requests
import requests.adapters

retry = urllib3.Retry(respect_retry_after_header=False)
adapter = requests.adapters.HTTPAdapter(max_retries=retry)
session = requests.Session()
session.mount("http://", adapter)
r = session.get("http://www.trouverlesmots.com")
print(r.status_code, r.headers)

Upvotes: 4

blhsing
blhsing

Reputation: 106891

Since sleep_for_retry calls get_retry_after, which calls parse_retry_after to parse the Retry-After header value, you can override parse_retry_after with a wrapper function that caps its return value with the min function (the example below caps it at 10 seconds):

from urllib3.util.retry import Retry
orig_parse_retry_after = Retry.parse_retry_after
Retry.parse_retry_after = lambda self, retry_after: min(10, orig_parse_retry_after(self, retry_after))

Upvotes: 0

Related Questions