anthony
anthony

Reputation: 41098

How to limit the size of a dictionary?

I'd like to work with a dict in python, but limit the number of key/value pairs to X. In other words, if the dict is currently storing X key/value pairs and I perform an insertion, I would like one of the existing pairs to be dropped. It would be nice if it was the least recently inserted/accesses key but that's not completely necessary.

If this exists in the standard library please save me some time and point it out!

Upvotes: 64

Views: 49442

Answers (8)

Haru Kaeru
Haru Kaeru

Reputation: 146

There is a library called CircularDict that implements this behaviour. It allows to limit the maximum amount of items the dict can store, but also to set memory usage limits.

It can be installed with:

pip install circular-dict

And used this way:

from circular_dict import CircularDict

# Initialize a CircularDict with a maximum length of 3
my_dict = CircularDict(maxlen=3) # You could also set maxsize_bytes=8*1024 bytes

# Fill it with 4 items
my_dict['item1'] = 'value1'
my_dict['item2'] = 'value2'
my_dict['item3'] = 'value3'
# When adding this 4th item, the 1st one will be dropped
my_dict['item4'] = 'value4'
print(circ_dict)

Ouptut will look like.

{'item2': 'value2', 'item3': 'value3', 'item4': 'value4'}

Upvotes: 0

Raymond Hettinger
Raymond Hettinger

Reputation: 226296

Here is a simple and efficient LRU cache written with dirt simple Python code that runs on any python version 1.5.2 or later:

class LRU_Cache:

    def __init__(self, original_function, maxsize=1000):
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = {}

        PREV, NEXT, KEY, VALUE = 0, 1, 2, 3         # link fields
        self.head = [None, None, None, None]        # oldest
        self.tail = [self.head, None, None, None]   # newest
        self.head[NEXT] = self.tail

    def __call__(self, *key):
        PREV, NEXT = 0, 1
        mapping, head, tail = self.mapping, self.head, self.tail

        link = mapping.get(key, head)
        if link is head:
            value = self.original_function(*key)
            if len(mapping) >= self.maxsize:
                old_prev, old_next, old_key, old_value = head[NEXT]
                head[NEXT] = old_next
                old_next[PREV] = head
                del mapping[old_key]
            last = tail[PREV]
            link = [last, tail, key, value]
            mapping[key] = last[NEXT] = tail[PREV] = link
        else:
            link_prev, link_next, key, value = link
            link_prev[NEXT] = link_next
            link_next[PREV] = link_prev
            last = tail[PREV]
            last[NEXT] = tail[PREV] = link
            link[PREV] = last
            link[NEXT] = tail
        return value

if __name__ == '__main__':
    p = LRU_Cache(pow, maxsize=3)
    for i in [1,2,3,4,5,3,1,5,1,1]:
        print(i, p(i, 2))

Upvotes: 9

Roger Pate
Roger Pate

Reputation:

Python 2.7 and 3.1 have OrderedDict and there are pure-Python implementations for earlier Pythons.

from collections import OrderedDict

class LimitedSizeDict(OrderedDict):
    def __init__(self, *args, **kwds):
        self.size_limit = kwds.pop("size_limit", None)
        OrderedDict.__init__(self, *args, **kwds)
        self._check_size_limit()

    def __setitem__(self, key, value):
        OrderedDict.__setitem__(self, key, value)
        self._check_size_limit()

    def _check_size_limit(self):
        if self.size_limit is not None:
            while len(self) > self.size_limit:
                self.popitem(last=False)

You would also have to override other methods that can insert items, such as update. The primary use of OrderedDict is so you can control what gets popped easily, otherwise a normal dict would work.

Upvotes: 60

Alex Martelli
Alex Martelli

Reputation: 881635

Here's a simple, no-LRU Python 2.6+ solution (in older Pythons you could do something similar with UserDict.DictMixin, but in 2.6 and better that's not recommended, and the ABCs from collections are preferable anyway...):

import collections

class MyDict(collections.MutableMapping):
    def __init__(self, maxlen, *a, **k):
        self.maxlen = maxlen
        self.d = dict(*a, **k)
        while len(self) > maxlen:
            self.popitem()
    def __iter__(self):
        return iter(self.d)
    def __len__(self):
        return len(self.d)
    def __getitem__(self, k):
        return self.d[k]
    def __delitem__(self, k):
        del self.d[k]
    def __setitem__(self, k, v):
        if k not in self and len(self) == self.maxlen:
            self.popitem()
        self.d[k] = v

d = MyDict(5)
for i in range(10):
    d[i] = i
    print(sorted(d))

As other answers mentioned, you probably don't want to subclass dict -- the explicit delegation to self.d is unfortunately boilerplatey but it does guarantee that every other method is properly supplied by collections.MutableMapping.

Upvotes: 16

Ian Chen
Ian Chen

Reputation: 335

There have been many good answers, but I want to point out a simple, pythonic implementation for LRU cache. It's similar to Alex Martelli's answer.

from collections import OrderedDict, MutableMapping

class Cache(MutableMapping):
    def __init__(self, maxlen, items=None):
        self._maxlen = maxlen
        self.d = OrderedDict()
        if items:
            for k, v in items:
                self[k] = v

    @property
    def maxlen(self):
        return self._maxlen

    def __getitem__(self, key):
        self.d.move_to_end(key)
        return self.d[key]

    def __setitem__(self, key, value):
        if key in self.d:
            self.d.move_to_end(key)
        elif len(self.d) == self.maxlen:
            self.d.popitem(last=False)
        self.d[key] = value

    def __delitem__(self, key):
        del self.d[key]

    def __iter__(self):
        return self.d.__iter__()

    def __len__(self):
        return len(self.d)

Upvotes: 6

vaab
vaab

Reputation: 10122

cachetools will provide you nice implementation of Mapping Hashes that does this (and it works on python 2 and 3).

Excerpt of the documentation:

For the purpose of this module, a cache is a mutable mapping of a fixed maximum size. When the cache is full, i.e. by adding another item the cache would exceed its maximum size, the cache must choose which item(s) to discard based on a suitable cache algorithm.

Upvotes: 23

Mike Graham
Mike Graham

Reputation: 76683

A dict does not have this behavior. You could make your own class that does this, for example something like

class MaxSizeDict(object):
    def __init__(self, max_size):
        self.max_size = max_size
        self.dict = {}
    def __setitem__(self, key, value):
        if key in self.dict:
            self.dict[key] = value    
            return

        if len(self.dict) >= self.max_size:
      ...

A few notes about this

  • It would be tempting for some to subclass dict here. You can technically do this, but it is bug-prone because the methods do not depend on each other. You can use UserDict.DictMixin to save having to define all methods. There are few methods you would be able re-use if you subclass dict.
  • A dict does not know what the least recently added key is, since dicts are unordered.
    • 2.7 will introduce collections.OrderedDict, but for now keeping the keys in order separately should work fine (use a collections.deque as a queue).
    • If getting the oldest isn't all that imporant, you can just use the popitem method to delete one arbitrary item.
  • I interprettered oldest to mean first insertion, approximately. You would have to do something a bit different to eliminate the LRU items. The most obvious efficient strategy would involve keeping a doubly-linked list of keys with references to the nodes themselves stored as dict values (along with the real values). This gets more complicated and implementing it in pure Python carries a lot of overhead.

Upvotes: 2

user150340
user150340

Reputation:

You can create a custom dictionary class by subclassing dict. In your case, you would have to override __setitem__ to have check your own length and delete something if the limit is recahed. The following example would print the current lenght after every insertion:

class mydict(dict):
    def __setitem__(self, k, v):
        dict.__setitem__(self, k, v)
        print len(self)

d = mydict()
d['foo'] = 'bar'
d['bar'] = 'baz'

Upvotes: 2

Related Questions