abisko
abisko

Reputation: 743

python check memory usage and stop caching if memory usage is too high

I hope to cache some DataFrames in memory to speed up my program (calculate_df is slow). My code is like

class Foo:
    cache = {}

    @classmethod
    def get_df(cls, bar):
        if bar not in cache:
            cls.cache[bar] = cls.calculate_df(bar)
        return cls.cache[bar]

    @classmethod
    def calculate_df(cls, bar):
        ......
        return df

Almost all of the time, the possible values of bar times the size of df fit into memory. However, I need to plan for cases that I have too many different bars and big df which make my cache cause memory issues. I hope to check memory usage first before I run cache[bar] = calculate_df(bar).

What is the right/best way to do such memory checks?

Upvotes: 1

Views: 1131

Answers (1)

Niel Godfrey P. Ponciano
Niel Godfrey P. Ponciano

Reputation: 10709

Instead of manually operating on such memory level in Python, you might want to consider using the decorator functools.lru_cache() where you can limit the number of items to maxsize that can be stored at any given point in time. Once maxsize is reached, it will evict the old items.

@functools.lru_cache(maxsize=128, typed=False)

Decorator to wrap a function with a memoizing callable that saves up to the maxsize most recent calls

Sample usage

from functools import lru_cache

class MyClass:
    @lru_cache(maxsize=3)
    def duplicate(self, num):
        print("called for", num)
        return num * 2

obj = MyClass()
for num in [12, 7, 12, 12, 7, 5, 15, 5, 7, 12]:
    print(num, "=", obj.duplicate(num))

Output

called for 12
12 = 24
called for 7
7 = 14
12 = 24
12 = 24
7 = 14
called for 5
5 = 10
called for 15
15 = 30
5 = 10
7 = 14
called for 12
12 = 24

Upvotes: 1

Related Questions