Reputation: 133
I'm hoping someone can explain why searching a list of object references is so much slower than searching a normal list. This is using the python "in" keyword to search which I thought runs at "C compiler" speed. I thought a list is just an array of object references (pointers) so the search should be extremely fast. Both lists are exactly 412236 bytes in memory.
Normal list (takes 0.000 seconds to search):
alist = ['a' for x in range(100000)]
if 'b' in alist:
print("Found")
List of object references (takes 0.469 !! seconds to search):
class Spam:
pass
spamlist = [Spam() for x in range(100000)]
if Spam() in spamlist:
print("Found")
[obj for obj in spamlist if obj is target]
3. Some other more Pythonic way?
Upvotes: 3
Views: 5827
Reputation: 281121
This is mostly due to the different special method lookup mechanics of old-style classes.
>>> timeit.timeit("Spam() in l", """
... # Old-style
... class Spam: pass
... l = [Spam() for i in xrange(100000)]""", number=10)
3.0454677856675403
>>> timeit.timeit("Spam() in l", """
... # New-style
... class Spam(object): pass
... l = [Spam() for i in xrange(100000)]""", number=10)
0.05137817007346257
>>> timeit.timeit("'a' in l", 'l = ["b" for i in xrange(100000)]', number=10)
0.03013876870841159
As you can see, the version where Spam
inherits from object
runs much faster, almost as fast as the case with strings.
The in
operator for lists uses ==
to compare items for equality. ==
is defined to try the objects' __eq__
methods, their __cmp__
methods, and pointer comparison, in that order.
For old-style classes, this is implemented in a straightforward but slow manner. Python has to actually look for the __eq__
and __cmp__
methods in each instance's dict and the dicts of each instance's class and superclasses. __coerce__
gets looked up too, as part of the 3-way compare process. When none of these methods actually exist, that's something like 12 dict lookups just to get to the pointer comparison. There's a bunch of other overhead besides the dict lookups, and I'm not actually sure which aspects of the process are the most time-consuming, but suffice it to say that the procedure is more expensive than it could be.
For built-in types and new-style classes, things are better. First, Python doesn't look for special methods on the instance's dict. This saves some dict lookups and enables the next part. Second, type objects have C-level function pointers corresponding to the Python-level special methods. When a special method is implemented in C or doesn't exist, the corresponding function pointer allows Python to skip the method lookup procedure entirely. This means that in the new-style case, Python can quickly detect that it should skip straight to the pointer comparison.
As for what you should do, I'd recommend using in
and new-style classes. If you find that this operation is becoming a bottleneck, but you need old-style classes for backward compatibility, any(x is y for y in l)
runs about 20 times faster than x in l
:
>>> timeit.timeit('x in l', '''
... class Foo: pass
... x = Foo(); l = [Foo()] * 100000''', number=10)
2.8618816054721936
>>> timeit.timeit('any(x is y for y in l)', '''
... class Foo: pass
... x = Foo(); l = [Foo()] * 100000''', number=10)
0.12331640524583776
Upvotes: 6
Reputation: 16940
This is not the right answer for your question but this will a very good knowledge for who wants to understand how 'in' keywords works under the hood :
ceval sourcecode : ceval.c source code abstract.c sourcecode : abstract.c source code From the mail : mail about 'in' keywords
Expalantion from the mail thread:
I'm curious enough about this (OK, I admit it, I like to be right, too ;) to dig in to the details, if anyone is interested...one of the benefits of Python being open-source is you can find out how it works...
First step, look at the bytecodes:
>>> import dis
>>> def f(x, y):
... return x in y
...
>>> dis.dis(f)
2 0 LOAD_FAST 0 (x)
3 LOAD_FAST 1 (y)
6 COMPARE_OP 6 (in)
9 RETURN_VALUE
So in
is implemented as a COMPARE_OP
. Looking in ceval.c
for
COMPARE_OP, it has some optimizations for a few fast compares, then
calls cmp_outcome()
which, for 'in', calls PySequence_Contains().
PySequence_Contains() is implemented in abstract.c. If the container
implements __contains__
, that is called, otherwise
_PySequence_IterSearch()
is used.
_PySequence_IterSearch()
calls PyObject_GetIter()
to constuct an
iterator on the sequence, then goes into an infinite loop (for (;;))
calling PyIter_Next() on the iterator until the item is found or the
call to PyIter_Next() returns an error.
PyObject_GetIter()
is also in abstract.c
. If the object has an
__iter__()
method, that is called, otherwise PySeqIter_New()
is called
to construct an iterator.
PySeqIter_New()
is implemented in iterobject.c
. It's next()
method is in
iter_iternext()
. This method calls __getitem__()
on its wrapped object
and increments an index for next time.
So, though the details are complex, I think it is pretty fair to say
that the implementation uses a while loop (in _PySequence_IterSearch())
and a counter (wrapped in PySeqIter_Type) to implement 'in' on a
container that defines __getitem__ but not __iter__
.
By the way the implementation of 'for' also calls PyObject_GetIter(), so
it uses the same mechanism to generate an iterator for a sequence that
defines __getitem__()
.
Upvotes: 2
Reputation: 103884
Using the the test of alist = ['a' for x in range(100000)]
can be very misleading because of string interning. Turns out that Python will intern (in most cases) short immutables -- especially strings -- so that they are all the same object.
Demo:
>>> alist=['a' for x in range(100000)]
>>> len(alist)
100000
>>> len({id(x) for x in alist})
1
You can see that while a list of 100000 strings is created it is only comprised of one interned object.
A more fair case would be to use a call to object
to guarantee that each is a unique Python object:
>>> olist=[object() for x in range(100000)]
>>> len(olist)
100000
>>> len({id(x) for x in olist})
100000
If you compare the in operator with olist
you will find the timing to be similar.
Upvotes: 0
Reputation: 2041
Python creates one immutable 'a' object and each element in a list points to the same object. Since Spam() is mutable, each instance is a different object, and dereferencing the pointers in spamlist will access many areas in RAM. The performance difference may have something to do with hardware cache hits/misses.
Obviously the performance difference would be even greater if you are including list creation time in your results (instead of just Spam() in spamlist
. Also try x = Spam(); x in spamlist
to see if that makes a difference.
I am curious how any(imap(equalsFunc, spamlist))
compares.
Upvotes: 0