Reputation: 35188
I've been writing little Python programs at home to learn more about the language. The most recent feature I've tried to understand are List Comprehensions. I created a little script that estimates when my car needs its next oil change based on how frequently I've gotten the oil changed in the past. In the code snippet below, oil_changes
is a list of the mileages at which I got the oil changed.
# Compute a list of the mileage differences between each oil change.
diffs = [j - i for i, j in zip(oil_changes[:-1], oil_changes[1:])]
# Use the average difference between oil changes to estimate the next change.
next_oil = oil_changes[-1] + sum(diffs) / len(diffs)
The code produces the right answer (did the math by hand to check) but it doesn't feel quite Pythonic yet. Am I doing a lot of needless copying of the original list in the first line? I feel like there's a much better way to do this but I don't know what it is.
Upvotes: 3
Views: 567
Reputation: 82934
Try this:
assert len(oil_changes) >= 2
sum_of_diffs = oil_changes[-1] - oil_changes[0]
number_of_diffs = len(oil_changes) - 1
average_diff = sum_of_diffs / float(number_of_diffs)
Upvotes: 9
Reputation: 881605
As other answers pointed out, you don't really need to worry unless your oil_changes
list is extremely long. However, as a fan of "stream-based" computing, I think it's interesting to point out that itertools
offers all the tools you need to compute your next_oil
value in O(1) space (and O(N) time of course!-) no matter how big N, that is, len(next_oil)
, gets.
izip
per se is insufficient, because it only reduces a bit the multiplicative constant but leaves your space demands as O(N). The key idea to bring those demands down to O(1) is to pair izip
with tee
-- and avoiding the list comprehension, which would be O(N) in space anyway, in favor of a good simple old-fashioned loop!-). Here comes:
it = iter(oil_changes)
a, b = itertools.tee(it)
b.next()
thesum = 0
for thelen, (i, j) in enumerate(itertools.izip(a, b)):
thesum += j - i
last_one = j
next_oil = last_one + thesum / (thelen + 1)
Instead of taking slices from the list, we take an iterator on it, tee it (making two independently advanceable clones thereof), and advance, once, one of the clones, b
. tee
takes space O(x) where x is the maximum absolute difference between the advancement of the various clones; here, the two clones' advancement only differs by 1 at most, so the space requirement is clearly O(1).
izip
makes a one-at-a-time "zipping" of the two slightly-askew clone iterators, and we dress it up in enumerate
so we can track how many times we go through the loop, i.e. the length of the iterable we're iterating on (we need the +1 in the final expression, because enumerate
starts from 0!-). We compute the sum with a simple +=
, which is fine for numbers (sum
is even better, but it wouldn't track the length!-).
It's tempting after the loop to use last_one = a.next()
, but that would not work because a
is actually exhausted -- izip
advances its argument iterables left to right, so it has advanced a
one last time before it realizes b
is over!-). That's OK, because Python loop variables are NOT limited in scope to the loop itself -- after the loop, j
still has the value that was last extracted by advancing b
before izip
gave up (just like thelen
still has the last count value returned by enumerate
). I'm still naming the value last_one
rather than using j
directly in the final expression, because I think it's clearer and more readable.
So there it is -- I hope it was instructive!-) -- although for the solution of the specific problem that you posed this time, it's almost certain to be overkill. We Italians have an ancient proverb -- "Impara l'Arte, e mettila da parte!"... "Learn the Art, and then set it aside" -- which I think is quite applicable here: it's a good thing to learn advanced and sophisticated ways to solve very hard problems, in case you ever meet them, but for all that you need to go for simplicity and directness in the vastly more common case of simple, ordinary problems -- not apply advanced solutions that most likely won't be needed!-)
Upvotes: 8
Reputation: 42183
Am I doing a lot of needless copying of the original list in the first line?
Technically, yes. Realistically, no. Unless you've changed your oil literally millions of times, the speed penalty is unlikely to be significant. You could change zip
to izip
, but it hardly seems worth it (and in python 3.0, zip
effectively is izip
).
Insert that old quote by Knuth here.
(you could also replace oil_changes[:-1]
with just oil_changes
, since zip()
truncates to the length of the shortest input sequence anyway)
Upvotes: 2
Reputation: 361595
The itertools
package provides additional generator-style functions. For instance, you can use izip
in place of zip
to save on some memory.
You could also perhaps write an average
function so you can turn diffs
into a generator instead of a list comprehension:
from itertools import izip
def average(items):
sum, count = 0, 0
for item in items:
sum += item
count += 1
return sum / count
diffs = (j - i for i, j in izip(oil_changes[:-1], oil_changes[1:])
next_oil = oil_changes[-1] + average(diffs)
Alternatively, you could change your definition of diffs
to:
diffs = [oil_changes[i] - oil_changes[i-1] for i in xrange(1, len(oil_changes))]
I dunno, it's not really a huge improvement. Your code is pretty good as is.
Upvotes: 3
Reputation: 8119
It seems fine, really. Not everything is simple (you've got several steps in an otherwise simple calculation, no matter how you frame it). There are options to reduce the copies, like using itertools.islice and itertools.izip, but (aside from izip) the extra steps in the code would just complicate it further. Not everything needs to be a list comprehension, but it is a judgement call sometimes. What looks cleaner to you? What will the next guy that reads it understand best? What will you understand when you come back to fix that bug in three months?
Upvotes: 2