Cache object instances with lru_cache and __hash__

Question

I don't undestand how functools.lru_cache works with object instances. I assume the class has to provide a __hash__ method. So any instance with the same hash should hit the cache.

Here is my test :

from functools import lru_cache

class Query:    
    def __init__(self, id: str):
        self.id = id

    def __hash__(self):
        return hash(self.id)

@lru_cache()
def fetch_item(item):
    return 'data'

o1 = Query(33)
o2 = Query(33)
o3 = 33

assert hash(o1) == hash(o2) == hash(o3)

fetch_item(o1)  # <-- expecting miss
fetch_item(o1)  # <-- expecting hit
fetch_item(o2)  # <-- expecting hit BUT get a miss !
fetch_item(o3)  # <-- expecting hit BUT get a miss !
fetch_item(o3)  # <-- expecting hit

info = fetch_item.cache_info()
print(info)

assert info.hits == 4
assert info.misses == 1
assert info.currsize == 1

How to cache calls of object instances with the same hash ?

sj95126 · Accepted Answer

Short answer: in order to get a cache hit on o2 when o1 is already in the cache, the class can define a __eq__() method, to compare whether Query objects have equal value.

For example:

def __eq__(self, other):
    return isinstance(other, Query) and self.id == other.id

Update: one additional detail worth mentioning in the summary rather than being buried in the details: the behavior described here also applies to the functools.cache wrapper introduced in Python 3.9, as @cache() is simply a shortcut for @lru_cache(maxsize=None).

Long answer (including o3):

There's a good explanation here about the exact mechanism for dictionary lookups, so I won't recreate it all. Sufficed to say, since the LRU cache is stored as a dict, class objects need to compare as equal to be considered as existing in the cache already, because of the way dictionary keys are compared.

You can see this in a quick example with a regular dictionary, with two versions of the class where one uses __eq__() and the other doesn't:

>>> o1 = Query_with_eq(33)
>>> o2 = Query_with_eq(33)
>>> {o1: 1, o2: 2}
{<__main__.Query_with_eq object at 0x6fffffea9430>: 2}

which results in one item in the dictionary, because the keys are equal, whereas:

>>> o1 = Query_without_eq(33)
>>> o2 = Query_without_eq(33)
>>> {o1: 1, o2: 2}
{<__main__.Query_without_eq object at 0x6fffffea9cd0>: 1, <__main__.Query_without_eq object at 0x6fffffea9c70>: 2}

results in two items (unequal keys).

Why int doesn't result in a cache hit when a Query object exists:

o3 is a regular int object. While its value does compare equal to Query(33), assuming Query.__eq__() compares types properly, lru_cache has an optimization that bypasses that comparison.

Normally, lru_cache creates a dictionary key (as a tuple) of the arguments to the wrapped function. Optionally, if the cache was created with the typed=True argument, it also stores the type of each argument, so that values are only equal if they are also of the same type.

The optimization is that if there is only one argument to the wrapped function, and it's type int or str, the single argument is used directly as the dictionary key, instead of being turned into a tuple. Therefore, (Query(33),) and 33 don't compare as equal, even though effectively they store the same value. (Note that I'm not saying that int objects aren't cached, only that they don't match an existing value of a non-int type. From your example, you can see that fetch_item(o3) gets a cache hit on the second call).

You can get cache hits if the argument was a different type than int. For example, 33.0 would match, again presuming that Query.__eq__() takes types into account and returns True. For that you could do something like:

def __eq__(self, other):
    if isinstance(other, Query):
        return self.id == other.id
    else:
        return self.id == other

Cache object instances with lru_cache and hash

Answers (2)

Related Questions