Is nesting generators in Python better?

Question

Ok, so I have tried reading a couple of sources: One, Two, Three. The first source clearly points that using generator expressions conserves memory, and makes things faster. But what happens when they get nested? Here is an example, and I'm hoping that someone will be able to help me understand it.

In  [1]: def nested():
             [(ix, '-'.join(str(x) for x in hours)) for ix in table.index.get_level_values(0)]
         %timeit nested
Out [1]: 10000000 loops, best of 3: 33.5 ns per loop

In  [2]: def not_nested():
             hr = '-'.join(str(x) for x in hours) #hr = 7-8
             [(ix, hr) for ix in table.index.get_level_values(0)]
         %timeit not_nested
Out [2]: 10000000 loops, best of 3: 38.7 ns per loop

In the above example, hours is a list 2 elements long, while the number of indexes in table at level 0 is 32.

If I were to run the two functions within my head, I would assume that in function nested the second half of the tuple ('-'.join(str(x) for x in hours) will be called as many times as the 'outer' (ix) loop runs (that is as many times as there are indexes in table). However, in function not_nested the second half of the tuple gets initialized once (stored in hr), and will not be run each time the second line executes.

First, am I correct in thinking that that is how Python works? If I am then can anyone explain how is that the run time of the nested function shorter than the one not nested?

Begin Edit/Solution

Apparently, I made a mistake in calling the function, as the two answers enlighten. I re-ran timeit with the correct function call, and got expected results:

In  [1]: %timeit nested()
Out [1]: 10000 loops, best of 3: 55.5 µs per loop

In  [2]: %timeit not_nested()
Out [2]: 100000 loops, best of 3: 4.53 µs per loop

Ultimately, making the runtime when not nested a fraction of the runtime when nested. Thanks to all for answering and clearing this out!

End Edit

Steve Jessop · Accepted Answer

You aren't measuring the time to call the function, you're measuring the time to look up the name of the function in the current scope.

Try %timeit nested().

As for why there's a difference in time -- it's tempting to say "because the second name is longer, so the string comparison in the hash lookup takes longer", but I don't think that's actually correct because (I think) the strings involved will have been interned by CPython anyway. Certainly, I can't get Python to consistently report longer times for looking up longer function names in my own tests.

If you're interested in the lookup time, start by running it more times and see how consistent the numbers are.

Is nesting generators in Python better?

Answers (2)

Related Questions