Reputation: 743
I hope to cache some DataFrames in memory to speed up my program (calculate_df
is slow). My code is like
class Foo:
cache = {}
@classmethod
def get_df(cls, bar):
if bar not in cache:
cls.cache[bar] = cls.calculate_df(bar)
return cls.cache[bar]
@classmethod
def calculate_df(cls, bar):
......
return df
Almost all of the time, the possible values of bar
times the size of df
fit into memory. However, I need to plan for cases that I have too many different bars
and big df
which make my cache
cause memory issues. I hope to check memory usage first before I run cache[bar] = calculate_df(bar)
.
What is the right/best way to do such memory checks?
Upvotes: 1
Views: 1131
Reputation: 10709
Instead of manually operating on such memory level in Python, you might want to consider using the decorator functools.lru_cache() where you can limit the number of items to maxsize
that can be stored at any given point in time. Once maxsize
is reached, it will evict the old items.
@functools.lru_cache(maxsize=128, typed=False)
Decorator to wrap a function with a memoizing callable that saves up to the
maxsize
most recent calls
Sample usage
from functools import lru_cache
class MyClass:
@lru_cache(maxsize=3)
def duplicate(self, num):
print("called for", num)
return num * 2
obj = MyClass()
for num in [12, 7, 12, 12, 7, 5, 15, 5, 7, 12]:
print(num, "=", obj.duplicate(num))
Output
called for 12
12 = 24
called for 7
7 = 14
12 = 24
12 = 24
7 = 14
called for 5
5 = 10
called for 15
15 = 30
5 = 10
7 = 14
called for 12
12 = 24
Upvotes: 1