jasonh
jasonh

Reputation: 191

Memoization of method working on python 3.6 but not on 3.7.3

I use a decorator to extend memoization via lru_cache to methods of objects which aren't themselves hashable (following stackoverflow.com/questions/33672412/python-functools-lru-cache-with-class-methods-release-object). This memoization works fine with python 3.6 but shows unexpected behavior on python 3.7.

Observed behavior: If the memoized method is called with keyword arguments, memoization works fine on both the python versions. If it's called without keyword arg syntax, it works on 3.6 but not on 3.7.

==> What could cause the different behavior?

The code sample below shows a minimal example which reproduces the behavior.

test_memoization_kwarg_call passes for both python 3.6 and 3.7. test_memoization_arg_call passes for python 3.6 but fails for 3.7.

import random
import weakref
from functools import lru_cache


def memoize_method(func):
    # From stackoverflow.com/questions/33672412/python-functools-lru-cache-with-class-methods-release-object
    def wrapped_func(self, *args, **kwargs):
        self_weak = weakref.ref(self)

        @lru_cache()
        def cached_method(*args_, **kwargs_):
            return func(self_weak(), *args_, **kwargs_)

        setattr(self, func.__name__, cached_method)
        print(args)
        print(kwargs)
        return cached_method(*args, **kwargs)

    return wrapped_func


class MyClass:
    @memoize_method
    def randint(self, param):
        return random.randint(0, int(1E9))


def test_memoization_kwarg_call():
    obj = MyClass()
    assert obj.randint(param=1) == obj.randint(param=1)
    assert obj.randint(1) == obj.randint(1)


def test_memoization_arg_call():
    obj = MyClass()
    assert obj.randint(1) == obj.randint(1)

Note that, weirdly, the line assert obj.randint(1) == obj.randint(1) does not lead to a test failure in test_memoization_kwarg_call when used in python 3.6 but fails for python 3.7 inside test_memoization_arg_call.

Python versions: 3.6.8 and 3.7.3, respectively.

Further info

user2357112 suggested to inspect import dis; dis.dis(test_memoization_arg_call). On python 3.6 this gives

 36           0 LOAD_GLOBAL              0 (MyClass)
              2 CALL_FUNCTION            0
              4 STORE_FAST               0 (obj)

 37           6 LOAD_FAST                0 (obj)
              8 LOAD_ATTR                1 (randint)
             10 LOAD_CONST               1 (1)
             12 CALL_FUNCTION            1
             14 LOAD_FAST                0 (obj)
             16 LOAD_ATTR                1 (randint)
             18 LOAD_CONST               1 (1)
             20 CALL_FUNCTION            1
             22 COMPARE_OP               2 (==)
             24 POP_JUMP_IF_TRUE        30
             26 LOAD_GLOBAL              2 (AssertionError)
             28 RAISE_VARARGS            1
        >>   30 LOAD_CONST               0 (None)
             32 RETURN_VALUE

On python 3.7 this gives

 36           0 LOAD_GLOBAL              0 (MyClass)
              2 CALL_FUNCTION            0
              4 STORE_FAST               0 (obj)

 37           6 LOAD_FAST                0 (obj)
              8 LOAD_METHOD              1 (randint)
             10 LOAD_CONST               1 (1)
             12 CALL_METHOD              1
             14 LOAD_FAST                0 (obj)
             16 LOAD_METHOD              1 (randint)
             18 LOAD_CONST               1 (1)
             20 CALL_METHOD              1
             22 COMPARE_OP               2 (==)
             24 POP_JUMP_IF_TRUE        30
             26 LOAD_GLOBAL              2 (AssertionError)
             28 RAISE_VARARGS            1
        >>   30 LOAD_CONST               0 (None)
             32 RETURN_VALUE

the difference being that on 3.6 the call to the cached randint method yields LOAD_ATTR, LOAD_CONST, CALL_FUNCTION while on 3.7 it is yields LOAD_METHOD, LOAD_CONST, CALL_METHOD. This may explain the difference in behavior but I do not understand the internals of CPython (?) to understand it. Any ideas?

Upvotes: 14

Views: 1604

Answers (3)

user2357112
user2357112

Reputation: 281012

This is a bug specifically in the Python 3.7.3 minor release. It was not present in Python 3.7.2, and it should not be present in Python 3.7.4 or 3.8.0. It was filed as Python issue 36650.

At C level, calls with no keyword arguments and calls with an empty **kwargs dict are handled differently. Depending on details of how a function is implemented, the function may receive NULL for kwargs instead of an empty kwargs dict. The C accelerator for functools.lru_cache treated calls with NULL kwargs differently from calls with an empty kwargs dict, leading to the bug you see here.

With the method cache recipe you're using, the first call to a method will always pass an empty kwargs dict to the C-level LRU wrapper, whether or not any keyword arguments were used, because of the return cached_method(*args, **kwargs) in wrapped_func. Subsequent calls may pass a NULL kwargs dict, because they no longer go through wrapped_func. This is why you could not reproduce the bug with test_memoization_kwarg_call; the first call has to pass no keyword arguments.

Upvotes: 4

acushner
acushner

Reputation: 9946

i've never said this about python before, but this honestly looks like a bug. i have no idea why it's happening, because all this stuff is in underlying C.

but here's what i'm seeing, attempting to peer into the black box:

i added some simple printing to your code:

def memoize_method(func):
    # From stackoverflow.com/questions/33672412/python-functools-lru-cache-with-class-methods-release-object
    def wrapped_func(self, *args, **kwargs):
        self_weak = weakref.ref(self)

        print('wrapping func')
        @lru_cache()
        def cached_method(*args_, **kwargs_):
            print('in cached_method', args_, kwargs_, id(cached_method))
            return func(self_weak(), *args_, **kwargs_)

        setattr(self, func.__name__, cached_method)
        return cached_method(*args, **kwargs)

    return wrapped_func

then i tested the function like this:

def test_memoization_arg_call():
    obj = MyClass()
    for _ in range(5):
        print(id(obj.randint), obj.randint(1), obj.randint.cache_info(), id(obj.randint))
    print()
    for _ in range(5):
        print(id(obj.randint), obj.randint(2), obj.randint.cache_info(), id(obj.randint))

here's the output:

==================================
wrapping func
in cached_method (1,) {} 4525448992
4521585800 668415661 CacheInfo(hits=0, misses=1, maxsize=128, currsize=1) 4525448992
in cached_method (1,) {} 4525448992
4525448992 920166498 CacheInfo(hits=0, misses=2, maxsize=128, currsize=2) 4525448992
4525448992 920166498 CacheInfo(hits=1, misses=2, maxsize=128, currsize=2) 4525448992
4525448992 920166498 CacheInfo(hits=2, misses=2, maxsize=128, currsize=2) 4525448992
4525448992 920166498 CacheInfo(hits=3, misses=2, maxsize=128, currsize=2) 4525448992

in cached_method (2,) {} 4525448992
4525448992 690871031 CacheInfo(hits=3, misses=3, maxsize=128, currsize=3) 4525448992
4525448992 690871031 CacheInfo(hits=4, misses=3, maxsize=128, currsize=3) 4525448992
4525448992 690871031 CacheInfo(hits=5, misses=3, maxsize=128, currsize=3) 4525448992
4525448992 690871031 CacheInfo(hits=6, misses=3, maxsize=128, currsize=3) 4525448992
4525448992 690871031 CacheInfo(hits=7, misses=3, maxsize=128, currsize=3) 4525448992

the interesting thing here is that it seems like it mis-caches the first positional args call. this doesn't happen with kwargs, and if you call a kwargs call first, it won't mis-cache that or any following pos args calls (which, for whatever reason, means your kwargs test is working). the important lines are this:

==================================
wrapping func
in cached_method (1,) {} 4525448992
4521585800 668415661 CacheInfo(hits=0, misses=1, maxsize=128, currsize=1) 4525448992
in cached_method (1,) {} 4525448992
4525448992 920166498 CacheInfo(hits=0, misses=2, maxsize=128, currsize=2) 4525448992
4525448992 920166498 CacheInfo(hits=1, misses=2, maxsize=128, currsize=2) 4525448992

you can see that i'm in function cached_method with id 4525448992 twice with the exact same args/kwargs, but it's not caching. it even shows the misses themselves in CacheInfo (first, the cache is empty. second, it can't find (1,) for some reason). that's all in C, so i don't know how to fix it...

i guess the best answer is to use another lru_cache method and wait for the devs to fix whatever's happening here.

edit: btw, great question.

Upvotes: 1

youknowone
youknowone

Reputation: 1074

I have a simpler solution about the problem:

pip install methodtools

Then,

import random
from methodtools import lru_cache


class MyClass:
    @lru_cache()
    def randint(self, param):
        return random.randint(0, int(1E9))


def test_memoization_kwarg_call():
    obj = MyClass()
    assert obj.randint(param=1) == obj.randint(param=1)
    assert obj.randint(1) == obj.randint(1)

I am sorry that this is not the answer for "why" but if you are also intrested in fixing the problem. This is tested with 3.7.3.

Upvotes: 2

Related Questions