Bob
Bob

Reputation: 508

Caching Python function results using only subset of arguments as identifier

Is there an easy way to cache function results in python based on a single identifier argument? For example, suppose my function has 3 arguments arg1, arg2 and id. Is there a simple way to cache the function result based only on the value of id? That is, whenever id takes the same value, the cached function would return the same result, regardless of arg1 and arg2.

Background: I have a time-consuming and repeatedly called function, in which arg1 and arg2 are lists and dictionaries composed of large numpy arrays. Hence, functools.lru_cache doesn't work as is. Yet, there are only a handful specific combinations of arg1 and arg2. Hence my idea to manually specify some id which takes a unique value for each possible combination of arg1 and arg2.

Upvotes: 5

Views: 4914

Answers (3)

qwr
qwr

Reputation: 10929

The best may be just to write your own simple decorator, as @DarrylG says, e.g.

from functools import wraps 

def memoize_first(func):
    """Memoize like functools.cache, but only consider first argument.

    Adapted from https://wiki.python.org/moin/PythonDecoratorLibrary
    """
    cache = func.cache = {}

    @wraps(func)
    def memoizer(arg1, *args):
        if arg1 not in cache:
            cache[arg1] = func(arg1, *args)
        return cache[arg1]
    return memoizer

Obviously once the result is cached, the other args won't affect the result, so you need to specify the correct args the first time to be cached.

Another twist: I think you can use function attributes as @python_user mentions to bypass arguments for caching, although this is not elegant as you have to specify those attributes separately from the function arguments. If arg1 and arg2 are constants, than this is a fine use of globals instead.

from functools import cache

@cache
def f(id):
    return compute(id, f.arg1, f.arg2) 

f.arg1 = big_list
f.arg2 = other_big_list

Upvotes: 1

python_user
python_user

Reputation: 7083

def cache(fun):
    cache.cache_ = {}
    def inner(arg1, arg2, id):
        if id not in cache.cache_:
            print(f'Caching {id}') # to check when it is cached
            cache.cache_[id] = fun(arg1, arg2, id)
        return cache.cache_[id]
    return inner
    
@cache
def function(arg1, arg2, arg3):
    print('something')

You can create your own decorator as suggested by DarrylG. You can do a print(cache.cache_) inside if id not in cache.cache_: to check that it only caches for newer values of id.

You can make cache_ a function attribute PEP 232 by using cache.cache_. Then when you want to reset cache_ you can use cache.cache_.clear(). That will give you direct access to the dictionary that caches the results.

function(1, 2, 'a')
function(11, 22, 'b')
function(11, 22, 'a')
function([111, 11], 222, 'a')

print(f'Cache {cache.cache_}') # view previously cached results
cache.cache_.clear() # clear cache
print(f'Cache {cache.cache_}') # cache is now empty

# call some function again to populate cache
function(1, 2, 'a')
function(11, 22, 'b')
function(11, 22, 'a')
function([111, 11], 222, 'a')

Edit: Addressing a new comment by @Bob (OP), in most cases returning a reference to the same object would suffice but OP's use-case seems to require a new copy of the answer, possibly due to the nature of function(arg1, arg2, arg3) being treated as unique based on arg1, arg_2 and arg3 (inside the "cache" function uniqueness is only defined using id). In which case, returning the same reference to a mutable object would lead to undesired behavior. As mentioned in the same comment, the return statement in the inner function should be changed from return cache.cache_[id] to return copy.deepcopy(cache.cache_[id]).

Upvotes: 4

madbird
madbird

Reputation: 1379

I think you could move excessive arguments to a separate function (caller), like below:

import functools

def get_and_update(a, b, c):
    return {'a': a, 'b': b, 'c': c}

# ->

@functools.lru_cache
def get_by_a(a):
    return {}

def get_and_update(a, b, c):
    res = get_by_a(a)
    res.update(a=a, b=b, c=c)
    return res

x1 = get_and_update('x', 1, 2)
x2 = get_and_update('x', 2, 3)
assert x1 is x2
print(x1, x2, sep='\n')
{'a': 'x', 'b': 2, 'c': 3}
{'a': 'x', 'b': 2, 'c': 3}

Upvotes: 1

Related Questions