Reputation: 1624
I have a large list and regularily need to find an item satisfying a rather complex condition (not equality), i.e. I am forced to check every item in the list until I find one. The conditions change, but some items match more often then others. So I would like to bring the matching item to the front of the list each time I find one, so frequently matching items are found more quickly.
Is there an efficient, pythonic way to do this?
Sequences ([]
) are backed by an array, so removing an item somewhere in the middle and prepending it to the array means moving every previous item. That's in O(n) time, not good.
In C you could build a linked list and move the item on your own when found. In Python there is a deque
, but afaik you cannot reference the node objects nor have access to .next
pointers.
And a self-made linked list is very slow in Python. (In fact it's slower than ordinary linear search without moving any item.)
Sadly, a dict
or set
finds items based on value equality and thus doesn't fit my problem.
As an illustration, here's the condition:
u, v, w = n.value # list item
if v in g[u] and w in g[v] and u not in g[w]:
...
Upvotes: 3
Views: 1964
Reputation: 882291
Consider instead a Pythonic approach. As Ed Post once put it, "The determined Real Programmer can write FORTRAN programs in any language" -- and this generalizes... you're trying to write C in Python and it isn't working well for you:-)
Rather, think of putting an auxiliary dict
cache next to the list
-- caching the indices where items are found (needs to be invalidated only on "deep" changes to the list's structure). Much simpler and faster...
Probably best done by having list
and dict
in a small class:
class Seeker(object):
def __init__(self, *a, **k):
self.l = list(*a, **k)
self.d = {}
def find(self, value):
where = self.d.get(value)
if where is None:
self.d[value] = where = self.l.find(value)
return where
def __setitem__(self, index, value):
if value in self.d: del self.d[value]
self.l[index] = value
# and so on for other mutators that invalidate self.d; then,
def __getattr__(self, name):
# delegate everything else to the list
return getattr(self.l, name)
You need only define the mutators you actually need to use -- e.g, if you won't do insert
, sort
, __delitem__
, &c, no need to define those, you can just delegate them to the list.
Added: in Python 3.2 or better, functools.lru_cache
can actually do most of the work for you -- use it to decorate find
and you'll get a better implementation of caching, with the ability to limit cache size if you so desire. To clear the cache, you'll need to call self.find.cache_clear()
at the appropriate spots (where I above use self.d = {}
) -- unfortunately, that crucial functionality is not (yet!-) documented (the volunteers updating the docs are not the same ones updating the code...!-)... but, trust me, it's not going to disappear on you:-).
Added: the OP edited the Q to clarify that he's not after "value equality", but rather some more complex set of conditions, exemplified by a predicate such as:
def good_for_g(g, n):
# for some container `g` and item value `n`:
u, v, w = n.value
return v in g[u] and w in g[v] and u not in g[w]
Presumably, then, the desire to bring "good" items towards the front is in turn predicated on their "goodness" being "sticky", i.e, g
staying pretty much the same for a while. In this case, one can use the predicate one as a feature extraction and checking function, which forms the key into the dictionary -- so for example:
class FancySeeker(object):
def __init__(self, *a, **k):
self.l = list(*a, **k)
self.d = {}
def _find_in_list(self, predicate):
for i, n in enumerate(self.l):
if predicate(n):
return i
return -1
def find(self, predicate):
where = self.d.get(predicate)
if where is None:
where = self._find_in_list(predicate)
self.d[predicate] = where
return where
and so forth.
So the remaining difficulty is to put predicate
in a form suitable for effective indexing into a dict
. If predicate
is just a function, no problem. But if predicate
is a function with parameters, as formed e.g by functools.partial
or as a bound method of some instance, that requires a bit of further processing/wrapping to make the indexing work.
Two calls to functools.partial
with the same bound argument(s) and function, for example, do not return equal objects -- one has, rather, to inspect the .args
and .func
of the returned objects to ensure, so to speak, a "singleton" is returned for any given (func, args)
pair.
Moreover, if some of the bound arguments are mutable, one needs to use their id
in lieu of their hash
(or else the raw functools.partial
object would not be hashable). It gets even hairier for bound methods, though they can similarly be wrapped into e.g a hashable, "equality adjusted" Predicate
class.
Lastly, if these gyrations prove too cumbersome and you really want a fast implementation of a linked list instead, look at https://pypi.python.org/pypi/llist/0.4 -- it's a C-coded implementation of singly and doubly linked lists for Python (for each kind, it implements three types: the list itself, the list node, and the list's iterator).
Upvotes: 3
Reputation: 40778
You can do exactly what you want using deque.rotate
.
from collections import deque
class Collection:
"Linked List collection that moves searched for items to the front of the collection"
def __init__(self, seq):
self._deque = deque(seq)
def __contains__(self, target):
for i, item in enumerate(self._deque):
if item == target:
self._deque.rotate(i)
self._deque.popleft()
self._deque.rotate(-i+1)
self._deque.appendleft(item)
return True
return False
def __str__(self):
return "Collection({})".format(str(self._deque))
c = Collection(range(10))
print(c)
print("5 in d:", 5 in c)
print(c)
Gives the following output:
Collection(deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
5 in c: True
Collection(deque([5, 0, 1, 2, 3, 4, 6, 7, 8, 9]))
Upvotes: 0