Reputation: 157
Through one of the posts here, I read that using next() to search and retrieve the first occurrence of an element in the list could be fast. However, I was stunned to see the traditional for-if-break syntax perform better by quite a bit of time. Correct me if I made a mistake in my analysis. Here is a snippet of what I tried:
>>> def compare_2():
... device = 'a'
... l = ['a', 'b', 'c', 'd']
... z = next((device for x in l if x==device), None)
>>> def compare_1():
... device = 'a'
... l = ['a', 'b', 'c', 'd']
... z = None
... for x in l:
... if x == device:
... z = device
... break
>>> import timeit
>>> t = timeit.Timer(setup='from __main__ import compare_2', stmt='compare_2()')
>>> t.timeit()
1.5207240581512451
>>> t = timeit.Timer(setup='from __main__ import compare_1', stmt='compare_1()')
>>> t.timeit()
0.46623396873474121
I thought this could be happening since I was trying to search and retrieve the first element in the list as an example. I also tried with the last element and noticed that next() does not perform any better than it did before.
>>> def compare_2():
... device = 'd'
... l = ['a', 'b', 'c', 'd']
... z = next((device for x in l if x==device), None)
...
>>>
>>> def compare_1():
... device = 'd'
... l = ['a', 'b', 'c', 'd']
... z = None
... for x in l:
... if x == device:
... z = device
... break
...
>>>
>>> t = timeit.Timer(setup='from __main__ import compare_2', stmt='compare_2()')
>>> t.timeit()
1.6903998851776123
>>> t = timeit.Timer(setup='from __main__ import compare_1', stmt='compare_1()')
>>> t.timeit()
0.66585493087768555
Would love to know when to actually use next() and when to not, in terms of optimizing the code. Thanks!
UPDATE:
if device in l
would be definitely faster.
I actually just wanted to prototype a simple case. I encountered this while trying to retrieve an object from a list of objects based on an attribute match. For eg:
obj = next(obj for obj in obj_list if obj.value == 1)
Upvotes: 4
Views: 3507
Reputation: 29033
I wonder if there's something else going on as well. There is some overhead with creating the generator, but I think that putting the condition if x==device
in the generator is forcing it to generate the entire list, and create a new list, before next() can run.
See this example comparing a list comprehension which forces the creation of a new list, and a generator which is lazy and doesn't force it:
>>> from timeit import Timer
>>> # List comprehension forces a new list to be created in memory
>>> def f1():
... q = [x for x in xrange(1000)]
... r = q[1]
... return r
...
>>> # Generator comprehension does 'lazy' iteration, only when needed
>>> def f2():
... q = (x for x in xrange(1000))
... r = next(q)
... return r
...
>>> Timer(f1).timeit()
47.420308774268435
>>> Timer(f2).timeit()
1.346566078497844
See the list comprehension is WAY slower, and the generator lazy approach means it only starts iterating when you call next(), takes a value and stops.
Now this example, the only change is that both take the last element with if x = 999
:
>>> # List comprehension still forces creation of a new list
>>> # although the list only ends up with one element
>>> # nb. it's the last element
>>> def f1():
... q = [x for x in xrange(1000) if x == 999]
... r = q[0]
... return r
...
>>> # Generator comprehension is lazy
>>> # nb. it also only returns the last element
>>> def f2():
... q = (x for x in xrange(1000) if x == 999)
... r = next(q)
... return r
...
>>> Timer(f1).timeit()
37.279105355189984
>>> Timer(f2).timeit()
37.46816399778598
See they are now basically the same. The generator has slowed right down. The condition forced it to do the same as the list comprehension, it could not lazily take just the one thing that matched without evaluating the entire list.
So I think in your examples, you are not just seeing the overhead of creating a generator and then calling it as other people are answering, and as my original comment said.
I think that by including the if x==device
condition, you are forcing the generator construct to iterate the entire list, and create a new list object, and fill it with all the results, then create a generator on that new list and then call it to get the result.
So there's a lot more overhead than the for loop iterating over the existing list, it's not because next()
is inherently slow.
Edit: You can see it in the proposal when generator expressions were added to Python: PEP-0289 - Generator Expressions, in the section about Early Binding vs Late Binding
Asked to summarize the reasoning for binding the first expression, Guido offered [5] :
Consider
sum(x for x in foo())
. Now suppose there's a bug in foo() that raises an exception, and a bug in sum() that raises an exception before it starts iterating over its argument. Which exception would you expect to see? I'd be surprised if the one in sum() was raised rather the one in foo(), since the call to foo() is part of the argument to sum(), and I expect arguments to be processed before the function is called.OTOH, in
sum(bar(x) for x in foo())
, where sum() and foo() are bugfree, but bar() raises an exception, we have no choice but to delay the call to bar() until sum() starts iterating -- that's part of the contract of generators. (They do nothing until their next() method is first called.)
In other words, if x==device
was going to thow an Exception because one item in the list couldn't be compared, e.g. a type error from a custom object, you would expect to see that exception before next() gets called at all, so that forces the entire list to be iterated, losing the saving you might hope to see for generator lazyness, and creating more list object creation overhead compared to the for loop.
Upvotes: 3
Reputation: 17704
I don't think your code is a fair comparison. When you call next, you are inside creating a generator "from scratch" using the built in python syntax. When you iterate over the list in a for loop, the for loop will call the iter method of the list, which probably returns an iterator that is pre-created, or at least created more efficiently than the generator. I would retry your timings where you have created the generator before hand, and then call next. Another way to time this more fairly might be to use a very very large list, that way the overhead of creating a generator will be small compared to traversing to find the first element meeting the condition.
Finally, I don't think you should worry about this sort of thing in python of all languages. Write the clearest code you can, don't optimize prematurely. Especially in a language like python, where often the route to optimization will involve calling libraries written in C with python bindings (e.g. numpy) or rewriting part of your code in C/C++ yourself.
Edit: Here's some source code and results
x = [1] * 1000000
x[500000] = 0
def func1(l):
...: for n in l:
...: if n == 0:
...: break
...:
def func2(l):
...: z = next((n for n in x if n == 0))
...:
%timeit func1(x)
100 loops, best of 3: 10.4 ms per loop
%timeit func2(x)
100 loops, best of 3: 10.4 ms per loop
Next is fine but honestly I don't think I've ever used it outside of finding the first element that met some condition. And honestly, I'm an experienced python programmer who's familiar with this idiom, but if I weren't then I would find the for loop far clearer.
Upvotes: 2
Reputation: 14379
next()
is very helpful and fast in combination with generators. Unluckily you have been using a normal list as the source of the generator. If you want to use the full potential of next, you have to use a generator as well.
>>> timeit.timeit('next((i for i in range(1000)))')
10.571892976760864
>>> timeit.timeit('next((i for i in xrange(1000)))')
0.9348869323730469
The more elements the larger the difference. With the second version Python does not have to handle all the list, but just the first element.
So next()
is not faster, but the concept of using iteratabels and generators is, when use consequently. And next()
was designed for them.
Upvotes: 2