Hyperboreus
Hyperboreus

Reputation: 32429

Caching and memory usage

I have a lot of classes that all do the same: They receive an identifier (the PK in the DB) during construction and then are loaded from the DB. I am trying to cache the instances of these classes in order to minimize calls down to the DB. When the cache reaches a critical size, it should discard those cached objects that have been accessed least recently.

The caching actually seems to work fine, but somehow I cannot determine the memory usage of the cache (in the line after #Next line doesn't do what I expected).

My code so far:

#! /usr/bin/python3.2

from datetime import datetime
import random
import sys

class Cache:
    instance = None

    def __new__ (cls):
        if not cls.instance:
            cls.instance = super ().__new__ (cls)
            cls.instance.classes = {}
        return cls.instance

    def getObject (self, cls, ident):
        if cls not in self.classes: return None
        cls = self.classes [cls]
        if ident not in cls: return None
        return cls [ident]

    def cache (self, object):
        #Next line doesn't do what I expected
        print (sys.getsizeof (self.classes) )
        if object.__class__ not in self.classes:
            self.classes [object.__class__] = {}
        cls = self.classes [object.__class__]
        cls [object.ident] = (object, datetime.now () )


class Cached:
    def __init__ (self, cache):
        self.cache = cache

    def __call__ (self, cls):
        cls.cache = self.cache

        oNew = cls.__new__
        def new (cls, ident):
            cached = cls.cache ().getObject (cls, ident)
            if not cached: return oNew (cls, ident)
            cls.cache ().cache (cached [0] )
            return cached [0]
        cls.__new__ = new

        def init (self, ident):
            if hasattr (self, 'ident'): return
            self.ident = ident
            self.load ()
        cls.__init__ = init

        oLoad = cls.load
        def load (self):
            oLoad (self)
            self.cache ().cache (self)
        cls.load = load

        return cls


@Cached (Cache)
class Person:
    def load (self):
        print ('Expensive call to DB')
        print ('Loading Person {}'.format (self.ident) )
        #Just simulating
        self.name = random.choice ( ['Alice', 'Bob', 'Mallroy'] )

@Cached (Cache)
class Animal:
    def load (self):
        print ('Expensive call to DB')
        print ('Loading Animal {}'.format (self.ident) )
        #Just simulating
        self.species = random.choice ( ['Dog', 'Cat', 'Iguana'] )

sys.getsizeof returns funny values.

How can I determine the actual memory usage of all cached objects?

Upvotes: 1

Views: 644

Answers (1)

Sheena
Sheena

Reputation: 16212

getsizeof is pretty tricksy, here's an illustration of the fact:

getsizeof([])       # returns 72   ------------A
getsizeof([1,])     # returns 80   ------------B
getsizeof(1)        # returns 24   ------------C
getsizeof([[1,],])  # returns 80   ------------D
getsizeof([[1,],1]) # returns 88   ------------E

Here's some stuff worth noting:

  • A: the size of an empty list is 72
  • B: the size of a list containing 1 is 8 bytes more
  • C: the size of 1 is not 8 bytes. The reason for this weirdness is that 1 exists separately to the list as a unique entity so line C returns the size of the entity while B returns the size of an empty list plus a reference to that entity.
  • D: This is thus the size of an empty list plus one reference to a different list
  • E: an empty list plus two references = 88 bytes

What I'm trying to get at here is that getsizeof can only help you get the sizes of things. You need to get the sizes of things as well as the sizes of the things those things refer to. This smells like recursion.

check out this recipe, it might help you out: http://code.activestate.com/recipes/546530/

Upvotes: 1

Related Questions