Reputation: 733
I need some kind of cache to store the result of a function f
in Cython for future reuse. A simple FIFO cache policy that discards the least recently computed result when the cache is full will do just fine. I need the cache to be reinitialised every time I call another function from Python which uses the cache and calls f
. I came up with the following solution using a std::map
wrapped in an extension type:
# distutils: language = c++
import sys
import time
from libcpp.map cimport map as cppmap
from libcpp.utility cimport pair as cpppair
from libcpp.queue cimport queue as cppqueue
from cython.operator cimport dereference as deref
ctypedef cpppair[long, long] mapitem_t
ctypedef cppmap[long, long].iterator mi_t
cdef class Cache_map:
"""Cache container"""
cdef:
cppmap[long, long] _cache_data
cppqueue[long] _order
long _cachesize
long _size
def __init__(self, long cachesize=100):
self._cachesize = cachesize
self._size = 0
cdef mi_t setitem(
self, mi_t it, long key, long value):
"""Insert key/value pair into cache and return position"""
if self._size >= self._cachesize:
self._cache_data.erase(self._order.front())
self._order.pop()
else:
self._size += 1
self._order.push(key)
return self._cache_data.insert(it, mapitem_t(key, value))
@property
def cache_data(self):
return self._cache_data
cdef long f(long x):
"""Expensive function"""
time.sleep(0.01)
return x**2
cdef long cached_f(long x, Cache_map Cache):
cdef mi_t search = Cache._cache_data.lower_bound(x)
if search != Cache._cache_data.end() and x == deref(search).first:
return deref(search).second
return deref(Cache.setitem(search, x, f(x))).second
def use_cache():
# Output container
cdef list cache_size = []
cdef list timings = []
cdef list results = []
cdef long i, r
cdef Cache_map Cache = Cache_map(10) # Initialise cache
cache_size.append(sys.getsizeof(Cache))
go = time.time()
for i in range(100):
# Silly loop using the cache
for r in range(2):
results.append(cached_f(i, Cache))
timings.append(time.time() - go)
go = time.time()
cache_size.append(sys.getsizeof(Cache))
go = time.time()
return cache_size, timings, results
While this works in principle, it has a few drawbacks:
cached_f
to wrap f
(not very reusable)Cache
to cached_f
(unnecessarily expensive???)Cached_map
is explicitly written to cache results from f
(not very reusable)I would imagine that this is quite a standard task, so is there a better way?
I tried, for example, to pass a pointer to the Cache to cached_f
but it seems I cannot create a pointer to an extension type object? The following:
cdef Cache_map Cache = Cache_map(10)
cdef Cache_map *Cache_ptr
Cache_ptr = &Cache
throws cache_map.pyx:66:16: Cannot take address of Python variable 'Cache'
.
Upvotes: 3
Views: 828
Reputation: 34377
I think from the software engineering point of view, it is a good idea to have the function (which is a function-pointer/functor in C/cdef-Cython) and its memoization bundled together in an object/class.
My approach would be to write a cdef class (let's call it FunWithMemoization
) which has a function pointer and a memoization-data-structure for storing known results.
Because the life is too short to write c++-code with Cython, I have written memoization-class in pure c++ (whole code can be found further bellow), which more or less is very similar to your approach (but rather using unordered_map
) and wrap/use it with Cython:
%%cython -+
from libcpp cimport bool
cdef extern from *:
"""
// see full code bellow
"""
struct memoization_result:
long value;
bool found;
cppclass memoization:
memoization()
void set_value(long, long)
memoization_result find_value(long key)
ctypedef long(*f_type)(long)
cdef long id_fun(long x):
return x
cdef class FunWithMemoization:
cdef memoization mem
cdef f_type fun
def __cinit__(self):
self.fun = id_fun
cpdef long evaluate(self, long x):
cdef memoization_result look_up = self.mem.find_value(x)
if look_up.found:
return look_up.value
cdef long val = self.fun(x)
self.mem.set_value(x, val)
return val
I'have used id_fun
to default initialize the fun
-member, but we need further functionality to make FunWithMemoization
useful, for example:
import time
cdef long f(long x):
"""Expensive function"""
time.sleep(0.01)
return x**2
def create_f_with_memoization():
fun = FunWithMemoization()
fun.fun = f
return fun
There are obviously other approached to create a useful FunWithMemoization
, one could use ctypes
to get the addresses of functions or this receipt.
And now:
f = create_f_with_memoization()
# first time really calculated:
%timeit -r 1 -n 1 f.evaluate(2)
#10.5 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
# second time - from memoization:
%timeit -r 1 -n 1 f.evaluate(2)
1.4 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Whole code:
%%cython -+
from libcpp cimport bool
cdef extern from *:
"""
#include<unordered_map>
#include <queue>
struct memoization_result{
long value;
bool found;
};
class memoization{
private:
std::unordered_map<long, long> map;
std::queue<long> key_order;
size_t max_size;
public:
memoization(): max_size(128){}
void set_value(long key, long val){
//assumes key isn't yet in map
map[key]=val;
key_order.push(key);
if(key_order.size()>max_size){
key_order.pop();
}
}
memoization_result find_value(long key) const{
auto it = map.find(key);
if(it==map.cend()){
return {0, false};
}
else{
return {it->second, true};
}
}
};
"""
struct memoization_result:
long value;
bool found;
cppclass memoization:
memoization()
void set_value(long, long)
memoization_result find_value(long key)
ctypedef long(*f_type)(long)
cdef long id_fun(long x):
return x
cdef class FunWithMemoization:
cdef memoization mem
cdef f_type fun
def __cinit__(self):
self.fun = id_fun
cpdef long evaluate(self, long x):
cdef memoization_result look_up = self.mem.find_value(x)
if look_up.found:
return look_up.value
cdef long val = self.fun(x)
self.mem.set_value(x, val)
return val
import time
cdef long f(long x):
"""Expensive function"""
time.sleep(0.01)
return x**2
def create_f_with_memoization():
fun = FunWithMemoization()
fun.fun = f
return fun
Upvotes: 2