Efficient list manipulation in python

Question

I have a large list and regularily need to find an item satisfying a rather complex condition (not equality), i.e. I am forced to check every item in the list until I find one. The conditions change, but some items match more often then others. So I would like to bring the matching item to the front of the list each time I find one, so frequently matching items are found more quickly.

Is there an efficient, pythonic way to do this?

Sequences ([]) are backed by an array, so removing an item somewhere in the middle and prepending it to the array means moving every previous item. That's in O(n) time, not good.

In C you could build a linked list and move the item on your own when found. In Python there is a deque, but afaik you cannot reference the node objects nor have access to .next pointers.

And a self-made linked list is very slow in Python. (In fact it's slower than ordinary linear search without moving any item.)

Sadly, a dict or set finds items based on value equality and thus doesn't fit my problem.

As an illustration, here's the condition:

u, v, w = n.value   # list item
if v in g[u] and w in g[v] and u not in g[w]:
    ...

Alex Martelli · Accepted Answer

Consider instead a Pythonic approach. As Ed Post once put it, "The determined Real Programmer can write FORTRAN programs in any language" -- and this generalizes... you're trying to write C in Python and it isn't working well for you:-)

Rather, think of putting an auxiliary dict cache next to the list -- caching the indices where items are found (needs to be invalidated only on "deep" changes to the list's structure). Much simpler and faster...

Probably best done by having list and dict in a small class:

class Seeker(object):
    def __init__(self, *a, **k):
        self.l = list(*a, **k)
        self.d = {}

    def find(self, value):
        where = self.d.get(value)
        if where is None:
            self.d[value] = where = self.l.find(value)
        return where

    def __setitem__(self, index, value):
        if value in self.d: del self.d[value]
        self.l[index] = value

    # and so on for other mutators that invalidate self.d; then,

    def __getattr__(self, name):
        # delegate everything else to the list
        return getattr(self.l, name)

You need only define the mutators you actually need to use -- e.g, if you won't do insert, sort, __delitem__, &c, no need to define those, you can just delegate them to the list.

Added: in Python 3.2 or better, functools.lru_cache can actually do most of the work for you -- use it to decorate find and you'll get a better implementation of caching, with the ability to limit cache size if you so desire. To clear the cache, you'll need to call self.find.cache_clear() at the appropriate spots (where I above use self.d = {}) -- unfortunately, that crucial functionality is not (yet!-) documented (the volunteers updating the docs are not the same ones updating the code...!-)... but, trust me, it's not going to disappear on you:-).

Added: the OP edited the Q to clarify that he's not after "value equality", but rather some more complex set of conditions, exemplified by a predicate such as:

def good_for_g(g, n):
    # for some container `g` and item value `n`:
    u, v, w = n.value
    return v in g[u] and w in g[v] and u not in g[w]

Presumably, then, the desire to bring "good" items towards the front is in turn predicated on their "goodness" being "sticky", i.e, g staying pretty much the same for a while. In this case, one can use the predicate one as a feature extraction and checking function, which forms the key into the dictionary -- so for example:

class FancySeeker(object):
    def __init__(self, *a, **k):
        self.l = list(*a, **k)
        self.d = {}

    def _find_in_list(self, predicate):
        for i, n in enumerate(self.l):
            if predicate(n):
                return i
        return -1

    def find(self, predicate):
        where = self.d.get(predicate)
        if where is None:
            where = self._find_in_list(predicate)
            self.d[predicate] = where
        return where

and so forth.

So the remaining difficulty is to put predicate in a form suitable for effective indexing into a dict. If predicate is just a function, no problem. But if predicate is a function with parameters, as formed e.g by functools.partial or as a bound method of some instance, that requires a bit of further processing/wrapping to make the indexing work.

Two calls to functools.partial with the same bound argument(s) and function, for example, do not return equal objects -- one has, rather, to inspect the .args and .func of the returned objects to ensure, so to speak, a "singleton" is returned for any given (func, args) pair.

Moreover, if some of the bound arguments are mutable, one needs to use their id in lieu of their hash (or else the raw functools.partial object would not be hashable). It gets even hairier for bound methods, though they can similarly be wrapped into e.g a hashable, "equality adjusted" Predicate class.

Lastly, if these gyrations prove too cumbersome and you really want a fast implementation of a linked list instead, look at https://pypi.python.org/pypi/llist/0.4 -- it's a C-coded implementation of singly and doubly linked lists for Python (for each kind, it implements three types: the list itself, the list node, and the list's iterator).

Efficient list manipulation in python

Answers (2)

Related Questions