Reputation: 1715
Given a list with descending order, e.g. [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 0, 0, -1, -2, -2]
and threshold = 1.2
, I want to get sublist from original list with all elements larger than threshold
Method1:
orgin_lst = [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 0, 0, -1, -2, -2]
lst = [i for i in orgin_lst if i > threshold]
This is pythonic way but we don't use the descending property and cannot break out when found a element not larger than threshold. If there are few satisfied elements but oringal list is very large, the performance is not good.
Method2:
orgin_lst = [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 0, 0, -1, -2, -2]
lst = []
for i in orgin_lst:
if i <= threshold:
break
lst.append(i)
However this code is not quite pythonic.
Is there a way that I can combine pythonic style and performance?
Upvotes: 4
Views: 458
Reputation: 27588
Binary search is fast for sorted data, O(log n) time. And Python's bisect
module already does it. It wants increasing data and yours is decreasing, but we can virtually make it increasing. Just use its shiny new key
parameter to negate the O(log n) accessed elements (and search for the negated threshold):
from bisect import bisect_left
from operator import neg
i = bisect_left(orgin_lst, -threshold, key=neg)
lst = orgin_lst[:i]
Alternatively, use a key function that returns False
for values larger than the threshold and True
otherwise. Since False
is smaller than True
(they act like 0
and 1
, respectively), we again have a monotonically increasing sequence and can use bisect
with this:
from bisect import bisect
i = bisect(orgin_lst, False, key=lambda x: x <= threshold)
lst = orgin_lst[:i]
If you don't need a separate new list, you could use del orgin_lst[i:]
to instead remove the unwanted elements.
Previously I would've written a proxy class to do the job now done by that much more convenient key parameter:
from bisect import bisect_left
class Negate:
def __getitem__(_, i):
return -orgin_lst[i]
i = bisect_left(Negate(), -threshold, 0, len(orgin_lst))
lst = orgin_lst[:i]
Or I might've written binary search myself, but I've done that so many times that at some point I started to loathe it.
Under your Method1, the list comprehension comparing every element, you wrote: "If there are few satisfied elements but oringal list is very large, the performance is not good". If that was not just an argument against that list comprehension but you actually do have mostly very few satisfied elements and a very long list, then exponential search could be better than binary search. But it would be more code (unless you find a package for it, I guess).
A simple iterative search like your Method2 (which I btw do find pythonic) or Chris' answer or with itertools.takewhile
would also be fast in such extreme cases, but for cases with large numbers of satisfied elements, they'd be much slower than binary search and exponential search.
itertools.takewhile
Like I said it would be slower in general, but it's fast for those best-cases and it's quite simple and clean:
from itertools import takewhile
lst = list(takewhile(lambda x: x > threshold, orgin_lst))
Like I said I do find your loop pythonic, and it's good for best-cases. But calling append
to individually append elements to the result is quite costly. Would probably be faster to at first just find the first too small element, then find its index and slice:
for i in orgin_lst:
if i <= threshold:
lst = orgin_lst[:orgin_lst.index(i)]
break
else:
lst = orgin_lst[:]
Again, if you're ok with just removing the unwanted elements from the existing list, use del
inside the if
and then you don't need the else
part here.
A similar solution I wrote for another question ended up second-fastest in the benchmark there.
Alternative implementation:
cut = None
for i in orgin_lst:
if i <= threshold:
cut = orgin_lst.index(i)
break
lst = orgin_lst[:cut]
Upvotes: 9
Reputation: 36496
I think your code was very close:
orgin_lst = [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 0, 0, -1, -2, -2]
lst = []
for i in orgin_lst:
if i <= threshold:
break
lst.append(i)
But let's employ a generator.
def take_until(f, it):
for x in it:
if f(x): return
yield x
Now, we can write something like the following, for instance.
>>> for x in take_until(lambda x: x <= 1.2, lst):
... print(x)
...
10
9
8
7
6
5
4
3
2
>>>
Heck, if we really want a list
, that's just as easy.
>>> list(take_until(lambda x: x <= 1.2, lst))
[10, 9, 8, 7, 6, 5, 4, 3, 2]
>>>
Upvotes: 1