WilliamKF
WilliamKF

Reputation: 43149

Can Python yield into a list be made more efficient?

In Python v2.7, if I have a function like this:

def holes_between(intervals):
  # Compute the holes between the intervals, for example:
  #   given the intervals: ([ 8,  9] [14, 18] [19, 20] [23, 32] [34, 49])
  #     compute the holes: ([10, 13] [21, 22] [33, 33])
  prec = intervals[0][1] + 1 # Bootstrap the iteration
  for low, high in intervals[1:]:
    if prec <= low - 1:
      yield (prec, low - 1)
    prec = high + 1

holes = list(holes_between(intervals))

Since the yield of the function is being collected into a list, is it more efficient to just build the list inside the holes_between function and if so, how would that be done most efficiently?

Upvotes: 3

Views: 194

Answers (2)

voithos
voithos

Reputation: 70562

In general, I'd say, the flexibility of a lazy evaluation outweighs the minor performance hit that you may get by using a generator. In cases where not all of the enumerable is used, the generator approach will perform better.

For example, let's say you wanted a function that checks for a maximum threshold for the size of holes between intervals:

def threshold(intervals, n):
    for low, high in holes_between(intervals):
        if (high - low + 1) >= n:
            return True
    return False

In this case, if the intervals iterable is large, then you can potentially save a lot of work if the threshold is exceeded early on. In general, any one of these kinds of "early-returning" functions would benefit from a generator.

If this is a critical part of your code and you've measured a definitive performance problem, then yes, you could rewrite it to build the list in the holes_between function itself. If you do so, be sure to benchmark the function to see if you actually made it perform better.

Upvotes: 0

Martijn Pieters
Martijn Pieters

Reputation: 1122222

A generator function can be less efficient than building a list directly.

You can just build the list in the holes_between() function and return that:

def holes_between(intervals):
    prec = intervals[0][1] + 1 # Bootstrap the iteration
    result = []
    for low, high in intervals[1:]:
        if prec <= low - 1:
            result.append((prec, low - 1))
        prec = high + 1
    return result

but do measure the differences, using the timeit module.

If you have some typical input, you'd test that with:

import timeit

def holes_between_list(intervals):
    prec = intervals[0][1] + 1 # Bootstrap the iteration
    result = []
    for low, high in intervals[1:]:
        if prec <= low - 1:
            result.append((prec, low - 1))
        prec = high + 1
    return result

def holes_between_generate(intervals):
    prec = intervals[0][1] + 1 # Bootstrap the iteration
    for low, high in intervals[1:]:
        if prec <= low - 1:
            yield (prec, low - 1)
        prec = high + 1

intervals = [ ... ] # fill in some test data

print 'As list:', timeit.timeit(
    'holes_between(intervals)',
    'from __main__ import intervals, holes_between_list as holes_between')

print 'Using a generator:', timeit.timeit(
    'list(holes_between(intervals))',
    'from __main__ import intervals, holes_between_generate as holes_between')

The lower value is the faster method for your test data.

Upvotes: 3

Related Questions