Michael Kristofik
Michael Kristofik

Reputation: 35188

Help needed improving Python code using List Comprehensions

I've been writing little Python programs at home to learn more about the language. The most recent feature I've tried to understand are List Comprehensions. I created a little script that estimates when my car needs its next oil change based on how frequently I've gotten the oil changed in the past. In the code snippet below, oil_changes is a list of the mileages at which I got the oil changed.

# Compute a list of the mileage differences between each oil change.
diffs = [j - i for i, j in zip(oil_changes[:-1], oil_changes[1:])]

# Use the average difference between oil changes to estimate the next change.
next_oil = oil_changes[-1] + sum(diffs) / len(diffs)

The code produces the right answer (did the math by hand to check) but it doesn't feel quite Pythonic yet. Am I doing a lot of needless copying of the original list in the first line? I feel like there's a much better way to do this but I don't know what it is.

Upvotes: 3

Views: 567

Answers (5)

John Machin
John Machin

Reputation: 82934

Try this:

assert len(oil_changes) >= 2
sum_of_diffs = oil_changes[-1] - oil_changes[0]
number_of_diffs = len(oil_changes) - 1
average_diff = sum_of_diffs / float(number_of_diffs)

Upvotes: 9

Alex Martelli
Alex Martelli

Reputation: 881605

As other answers pointed out, you don't really need to worry unless your oil_changes list is extremely long. However, as a fan of "stream-based" computing, I think it's interesting to point out that itertools offers all the tools you need to compute your next_oil value in O(1) space (and O(N) time of course!-) no matter how big N, that is, len(next_oil), gets.

izip per se is insufficient, because it only reduces a bit the multiplicative constant but leaves your space demands as O(N). The key idea to bring those demands down to O(1) is to pair izip with tee -- and avoiding the list comprehension, which would be O(N) in space anyway, in favor of a good simple old-fashioned loop!-). Here comes:

  it = iter(oil_changes)
  a, b = itertools.tee(it)
  b.next()
  thesum = 0
  for thelen, (i, j) in enumerate(itertools.izip(a, b)):
    thesum += j - i
  last_one = j
  next_oil = last_one + thesum / (thelen + 1)

Instead of taking slices from the list, we take an iterator on it, tee it (making two independently advanceable clones thereof), and advance, once, one of the clones, b. tee takes space O(x) where x is the maximum absolute difference between the advancement of the various clones; here, the two clones' advancement only differs by 1 at most, so the space requirement is clearly O(1).

izip makes a one-at-a-time "zipping" of the two slightly-askew clone iterators, and we dress it up in enumerate so we can track how many times we go through the loop, i.e. the length of the iterable we're iterating on (we need the +1 in the final expression, because enumerate starts from 0!-). We compute the sum with a simple +=, which is fine for numbers (sum is even better, but it wouldn't track the length!-).

It's tempting after the loop to use last_one = a.next(), but that would not work because a is actually exhausted -- izip advances its argument iterables left to right, so it has advanced a one last time before it realizes b is over!-). That's OK, because Python loop variables are NOT limited in scope to the loop itself -- after the loop, j still has the value that was last extracted by advancing b before izip gave up (just like thelen still has the last count value returned by enumerate). I'm still naming the value last_one rather than using j directly in the final expression, because I think it's clearer and more readable.

So there it is -- I hope it was instructive!-) -- although for the solution of the specific problem that you posed this time, it's almost certain to be overkill. We Italians have an ancient proverb -- "Impara l'Arte, e mettila da parte!"... "Learn the Art, and then set it aside" -- which I think is quite applicable here: it's a good thing to learn advanced and sophisticated ways to solve very hard problems, in case you ever meet them, but for all that you need to go for simplicity and directness in the vastly more common case of simple, ordinary problems -- not apply advanced solutions that most likely won't be needed!-)

Upvotes: 8

John Fouhy
John Fouhy

Reputation: 42183

Am I doing a lot of needless copying of the original list in the first line?

Technically, yes. Realistically, no. Unless you've changed your oil literally millions of times, the speed penalty is unlikely to be significant. You could change zip to izip, but it hardly seems worth it (and in python 3.0, zip effectively is izip).

Insert that old quote by Knuth here.

(you could also replace oil_changes[:-1] with just oil_changes, since zip() truncates to the length of the shortest input sequence anyway)

Upvotes: 2

John Kugelman
John Kugelman

Reputation: 361595

The itertools package provides additional generator-style functions. For instance, you can use izip in place of zip to save on some memory.

You could also perhaps write an average function so you can turn diffs into a generator instead of a list comprehension:

from itertools import izip

def average(items):
    sum, count = 0, 0

    for item in items:
        sum   += item
        count += 1

    return sum / count

diffs = (j - i for i, j in izip(oil_changes[:-1], oil_changes[1:])
next_oil = oil_changes[-1] + average(diffs)

Alternatively, you could change your definition of diffs to:

diffs = [oil_changes[i] - oil_changes[i-1] for i in xrange(1, len(oil_changes))]

I dunno, it's not really a huge improvement. Your code is pretty good as is.

Upvotes: 3

ironfroggy
ironfroggy

Reputation: 8119

It seems fine, really. Not everything is simple (you've got several steps in an otherwise simple calculation, no matter how you frame it). There are options to reduce the copies, like using itertools.islice and itertools.izip, but (aside from izip) the extra steps in the code would just complicate it further. Not everything needs to be a list comprehension, but it is a judgement call sometimes. What looks cleaner to you? What will the next guy that reads it understand best? What will you understand when you come back to fix that bug in three months?

Upvotes: 2

Related Questions