Does requests_cache automatically update cache on update of info?

Question

I have a very unreliable API that I request using Python. I have been thinking about using requests_cache and setting expire_after to be 999999999999 like I have seen other people do. The only problem is, I do not know if when the API works again, that if the data is updated. If requests_cache will automatically auto-update and delete the old entry.

I have tried reading the docs but I cannot really see this anywhere.

Martijn Pieters · Accepted Answer

requests_cache will not update until the expire_after time has passed. In that case it will not detect that your API is back to a working state.

I note that the project has since added an option that I implemented in the past; you can now set the old_data_on_error option when configuring the cache; see the CachedSession documentation:

old_data_on_error – If True it will return expired cached response if update fails.

It would reuse existing cache data in case a backend update fails, rather than delete that data.

In the past, I created my own requests_cache session setup (plus small patch) that would reuse cached values beyond expire_after if the backend gave a 500 error or timed out (using short timeouts) to deal with a problematic API layer, rather than rely on expire_after:

import logging

from datetime import (
    datetime,
    timedelta
)
from requests.exceptions import (
    ConnectionError,
    Timeout,
)
from requests_cache.core import (
    dispatch_hook,
    CachedSession,
)

log = logging.getLogger(__name__)
# Stop logging from complaining if no logging has been configured.
log.addHandler(logging.NullHandler())


class FallbackCachedSession(CachedSession):
    """Cached session that'll reuse expired cache data on timeouts

    This allows survival in case the backend is down, living of stale
    data until it comes back.

    """

    def send(self, request, **kwargs):
        # this *bypasses* CachedSession.send; we want to call the method
        # CachedSession.send() would have delegated to!
        session_send = super(CachedSession, self).send
        if (self._is_cache_disabled or
                request.method not in self._cache_allowable_methods):
            response = session_send(request, **kwargs)
            response.from_cache = False
            return response

        cache_key = self.cache.create_key(request)

        def send_request_and_cache_response(stale=None):
            try:
                response = session_send(request, **kwargs)
            except (Timeout, ConnectionError):
                if stale is None:
                    raise
                log.warning('No response received, reusing stale response for '
                            '%s', request.url)
                return stale

            if stale is not None and response.status_code == 500:
                log.warning('Response gave 500 error, reusing stale response '
                            'for %s', request.url)
                return stale

            if response.status_code in self._cache_allowable_codes:
                self.cache.save_response(cache_key, response)
            response.from_cache = False
            return response

        response, timestamp = self.cache.get_response_and_time(cache_key)
        if response is None:
            return send_request_and_cache_response()

        if self._cache_expire_after is not None:
            is_expired = datetime.utcnow() - timestamp > self._cache_expire_after
            if is_expired:
                self.cache.delete(cache_key)
                # try and get a fresh response, but if that fails reuse the
                # stale one
                return send_request_and_cache_response(stale=response)

        # dispatch hook here, because we've removed it before pickling
        response.from_cache = True
        response = dispatch_hook('response', request.hooks, response, **kwargs)
        return response


def basecache_delete(self, key):
    # We don't really delete; we instead set the timestamp to
    # datetime.min. This way we can re-use stale values if the backend
    # fails
    try:
        if key not in self.responses:
            key = self.keys_map[key]
        self.responses[key] = self.responses[key][0], datetime.min
    except KeyError:
        return

from requests_cache.backends.base import BaseCache
BaseCache.delete = basecache_delete

The above subclass of CachedSession bypasses the original send() method to instead go directly to the original requests.Session.send() method, to return existing cached value even if the timeout has passed but the backend has failed. Deletion is disabled in favour of setting the timeout value to 0, so we can still reuse that old value if a new request fails.

Use the FallbackCachedSession instead of a regular CachedSession object.

If you wanted to use requests_cache.install_cache(), make sure to pass in FallbackCachedSession to that function in the session_factory keyword argument:

import requests_cache

requests_cache.install_cache(
    'cache_name', backend='some_backend', expire_after=180,
    session_factory=FallbackCachedSession)

My approach is a little more comprehensive than what requests_cache implemented some time after I hacked together the above; my version will fall back to a stale response even if you explicitly marked it as deleted before.

Does requests_cache automatically update cache on update of info?

Answers (2)

Related Questions