nowox
nowox

Reputation: 29126

How to cache/memoize objects in Python?

I have some objects that are very slow to instantiate. They are representation of data loaded from external sources such as YAML files, and loading large YAML files is slow (I don't know why).

I know these objects depends on some external factors:

Ideally I would like a transparent non boilerplate method to cache these objects if the external factors are the same:

@cache(depfiles=('foo',), depvars=(os.environ['FOO'],))
class Foo():
    def __init__(*args, **kwargs):
        with open('foo') as fd:
           self.foo = fd.read()
        self.FOO = os.environ['FOO']
        self.args = args
        self.kwargs = kwargs

The main idea is that the first time I instantiate Foo, a cache file is created with the content of the object, then the next time I instantiate it (in another Python session), the cache file will be used only if none of the dependencies and argument have changed.

The solution I've found so far is based on shelve:

import shelve

class Foo(object):
    _cached = False
    def __new__(cls, *args, **kwargs):
        cache = shelve.open('cache')
        cache_foo = cache.get(cls.__name__)
        if isinstance(cache_foo, Foo):
            cache_foo._cached = True
            return cache_foo
        self = super(Foo, cls).__new__(cls, *args, **kwargs)
        return self

    def __init__(self, *args, **kwargs):
        if self._cached:
            return

        time.sleep(2) # Lots of work
        self.answer = 42

        cache = shelve.open('cache')
        cache[self.__class__.__name__] = self
        cache.sync() 

It works perfectly as is but it is too boilerplate and it doesn't cover all the cases:

Is there any native solution to achieve similar behavior in Python?

Upvotes: 0

Views: 1055

Answers (1)

Duncan
Duncan

Reputation: 95732

Python 3 provides the functools.lru_cache() decorator to provide memoization of callables, but I think you're asking to preserve the caching across multiple runs of your application and by that point there is such a variety of differing requirements that you're unlikely to find a 'one size fits all' solution.

If your own answer works for you then use it. So far as 'too much boilerplate' is concerned I would extract the caching out into a separate mixin class: the first reference to Foo in __new__ probably ought to be cls in any case and you can use the __qualname__ attribute instead of cls.__name__ to reduce the likelihood of class name conflicts (assuming Python 3.3 or later).

Upvotes: 1

Related Questions