Kartik
Kartik

Reputation: 8703

Is nesting generators in Python better?

Ok, so I have tried reading a couple of sources: One, Two, Three. The first source clearly points that using generator expressions conserves memory, and makes things faster. But what happens when they get nested? Here is an example, and I'm hoping that someone will be able to help me understand it.

In  [1]: def nested():
             [(ix, '-'.join(str(x) for x in hours)) for ix in table.index.get_level_values(0)]
         %timeit nested
Out [1]: 10000000 loops, best of 3: 33.5 ns per loop

In  [2]: def not_nested():
             hr = '-'.join(str(x) for x in hours) #hr = 7-8
             [(ix, hr) for ix in table.index.get_level_values(0)]
         %timeit not_nested
Out [2]: 10000000 loops, best of 3: 38.7 ns per loop

In the above example, hours is a list 2 elements long, while the number of indexes in table at level 0 is 32.

If I were to run the two functions within my head, I would assume that in function nested the second half of the tuple ('-'.join(str(x) for x in hours) will be called as many times as the 'outer' (ix) loop runs (that is as many times as there are indexes in table). However, in function not_nested the second half of the tuple gets initialized once (stored in hr), and will not be run each time the second line executes.

First, am I correct in thinking that that is how Python works? If I am then can anyone explain how is that the run time of the nested function shorter than the one not nested?

Begin Edit/Solution

Apparently, I made a mistake in calling the function, as the two answers enlighten. I re-ran timeit with the correct function call, and got expected results:

In  [1]: %timeit nested()
Out [1]: 10000 loops, best of 3: 55.5 µs per loop

In  [2]: %timeit not_nested()
Out [2]: 100000 loops, best of 3: 4.53 µs per loop

Ultimately, making the runtime when not nested a fraction of the runtime when nested. Thanks to all for answering and clearing this out!

End Edit

Upvotes: 0

Views: 97

Answers (2)

enrico.bacis
enrico.bacis

Reputation: 31524

You are timing how long it takes to retrieve the function from the namespace. The namespace is a dictionary and in order to search something in a dictionary, python has to invoke the hash function on the key (the function name).

Longer keys take more time to produce the hash, so the delta you are seeing is because nested is 6 letters, while not_nested is 10 letters. Try to switch the names and magically the slower function will become faster according to your test. In order to time a function you have to invoke it:

%timeit nested()
%timeit not_nested()

Upvotes: 1

Steve Jessop
Steve Jessop

Reputation: 279335

You aren't measuring the time to call the function, you're measuring the time to look up the name of the function in the current scope.

Try %timeit nested().

As for why there's a difference in time -- it's tempting to say "because the second name is longer, so the string comparison in the hash lookup takes longer", but I don't think that's actually correct because (I think) the strings involved will have been interned by CPython anyway. Certainly, I can't get Python to consistently report longer times for looking up longer function names in my own tests.

If you're interested in the lookup time, start by running it more times and see how consistent the numbers are.

Upvotes: 2

Related Questions