Reputation: 43149
In Python v2.7, if I have a function like this:
def holes_between(intervals):
# Compute the holes between the intervals, for example:
# given the intervals: ([ 8, 9] [14, 18] [19, 20] [23, 32] [34, 49])
# compute the holes: ([10, 13] [21, 22] [33, 33])
prec = intervals[0][1] + 1 # Bootstrap the iteration
for low, high in intervals[1:]:
if prec <= low - 1:
yield (prec, low - 1)
prec = high + 1
holes = list(holes_between(intervals))
Since the yield
of the function is being collected into a list
, is it more efficient to just build the list inside the holes_between
function and if so, how would that be done most efficiently?
Upvotes: 3
Views: 194
Reputation: 70562
In general, I'd say, the flexibility of a lazy evaluation outweighs the minor performance hit that you may get by using a generator. In cases where not all of the enumerable is used, the generator approach will perform better.
For example, let's say you wanted a function that checks for a maximum threshold for the size of holes between intervals:
def threshold(intervals, n):
for low, high in holes_between(intervals):
if (high - low + 1) >= n:
return True
return False
In this case, if the intervals iterable is large, then you can potentially save a lot of work if the threshold is exceeded early on. In general, any one of these kinds of "early-returning" functions would benefit from a generator.
If this is a critical part of your code and you've measured a definitive performance problem, then yes, you could rewrite it to build the list in the holes_between
function itself. If you do so, be sure to benchmark the function to see if you actually made it perform better.
Upvotes: 0
Reputation: 1122222
A generator function can be less efficient than building a list directly.
You can just build the list in the holes_between()
function and return that:
def holes_between(intervals):
prec = intervals[0][1] + 1 # Bootstrap the iteration
result = []
for low, high in intervals[1:]:
if prec <= low - 1:
result.append((prec, low - 1))
prec = high + 1
return result
but do measure the differences, using the timeit
module.
If you have some typical input, you'd test that with:
import timeit
def holes_between_list(intervals):
prec = intervals[0][1] + 1 # Bootstrap the iteration
result = []
for low, high in intervals[1:]:
if prec <= low - 1:
result.append((prec, low - 1))
prec = high + 1
return result
def holes_between_generate(intervals):
prec = intervals[0][1] + 1 # Bootstrap the iteration
for low, high in intervals[1:]:
if prec <= low - 1:
yield (prec, low - 1)
prec = high + 1
intervals = [ ... ] # fill in some test data
print 'As list:', timeit.timeit(
'holes_between(intervals)',
'from __main__ import intervals, holes_between_list as holes_between')
print 'Using a generator:', timeit.timeit(
'list(holes_between(intervals))',
'from __main__ import intervals, holes_between_generate as holes_between')
The lower value is the faster method for your test data.
Upvotes: 3