Reputation: 157
I'm a programming newbie and am having some trouble understanding an example from my python textbook ("Beginning Python" by Magnus Lie Hetland). The example is for a recursive generator designed to flatten the elements of nested lists (with arbitrary depth):
def flatten(nested):
try:
for sublist in nested:
for element in flatten(sublist):
yield element
except TypeError:
yield nested
You would then feed in a nested list as follows:
>>> list(flatten([[[1],2],3,4,[5,[6,7]],8]))
[1,2,3,4,5,6,7,8]
I understand how the recursion within flatten() helps to whittle down to the innermost element of this list, '1', but what I don't understand is what happens when '1' is actually passed back into flatten() as 'nested'. I thought that this would lead to a TypeError (can't iterate over a number), and that the exception handling was what would actually do the heavy lifting for generating output... but testing with modified versions of flatten() has convinced me that this isn't the case. Instead, it seems like the 'yield element' line is responsible.
That said, my question is this... how can 'yield element' ever actually be executed? It seems like 'nested' will either be a list - in which case another layer of recursion is added - or it's a number and you get a TypeError.
Any help with this would be much appreciated... in particular, I'd love to be walked through the chain of events as flatten() handles a simple example like:
list(flatten([[1,2],3]))
Upvotes: 10
Views: 4658
Reputation: 151007
Perhaps part of your confusion is that you're thinking of the final yield
statement as though it were a return
statement. Indeed, a couple of people have suggested that when a TypeError
is thrown in this code, the item passed is "returned". That's not the case!
Remember that any time yield
appears in a function, the result is not a single item, but an iterable -- even if only one item appears in the sequence. So when you pass 1
to flatten
, the result is a one-item generator. To get the item out of it, you still need to iterate over it.
Since this one-item generator is iterable, it doesn't throw a TypeError
when the inner for
loop tries to iterate over it; but the inner for
loop only executes once. Then the outer for
loop moves on to the next iterable in the nested list.
Another way to think about this would be to say that every time you pass a non-iterable value to flatten
, it wraps the value in a one-item iterable and "returns" that.
Upvotes: 6
Reputation: 77137
A great way to break down a function that you generally understand, but one little part is stumping you, is to use the python debugger. Here it is with comments added:
-> def flatten(nested):
(Pdb) l
1 -> def flatten(nested):
2 try:
3 for sublist in nested:
4 for element in flatten(sublist):
5 yield element
6 except TypeError:
7 yield nested
8
9 import pdb; pdb.set_trace()
10 list(flatten([[1,2],3]))
11
(Pdb) a
nested = [[1, 2], 3]
Above, we've just entered the function and the argument is [[1, 2], 3]
. Let's use pdb's step function to step through the function into any recursive calls we should encounter:
(Pdb) s
> /Users/michael/foo.py(2)flatten()
-> try:
(Pdb) s
> /Users/michael/foo.py(3)flatten()
-> for sublist in nested:
(Pdb) s
> /Users/michael/foo.py(4)flatten()
-> for element in flatten(sublist):
(Pdb) s
--Call--
> /Users/michael/foo.py(1)flatten()
-> def flatten(nested):
(Pdb) a
nested = [1, 2]
We've stepped into one inner frame of flatten
, where the argument is [1, 2]
.
(Pdb) s
> /Users/michael/foo.py(2)flatten()
-> try:
(Pdb) s
> /Users/michael/foo.py(3)flatten()
-> for sublist in nested:
(Pdb) s
> /Users/michael/foo.py(4)flatten()
-> for element in flatten(sublist):
(Pdb) s
--Call--
> /Users/michael/foo.py(1)flatten()
-> def flatten(nested):
(Pdb) a
nested = 1
Two frames in, the argument 1
isn't an iterable anymore. This should be interesting…
(Pdb) s
> /Users/michael/foo.py(2)flatten()
-> try:
(Pdb) s
> /Users/michael/foo.py(3)flatten()
-> for sublist in nested:
(Pdb) s
TypeError: "'int' object is not iterable"
> /Users/michael/foo.py(3)flatten()
-> for sublist in nested:
(Pdb) s
> /Users/michael/foo.py(6)flatten()
-> except TypeError:
(Pdb) s
> /Users/michael/foo.py(7)flatten()
-> yield nested
(Pdb) s
--Return--
> /Users/michael/foo.py(7)flatten()->1
-> yield nested
OK, so because of the except TypeError
, we're just yielding the argument itself. Up a frame!
(Pdb) s
> /Users/michael/foo.py(5)flatten()
-> yield element
(Pdb) l
1 def flatten(nested):
2 try:
3 for sublist in nested:
4 for element in flatten(sublist):
5 -> yield element
6 except TypeError:
7 yield nested
8
9 import pdb; pdb.set_trace()
10 list(flatten([[1,2],3]))
11
yield element
will of course yield 1
, so once our lowest frame hits a TypeError
, the result propagates all the way up the stack to the outermost frame of flatten
, which yields it to the outside world before moving on to further parts of the outer iterable.
Upvotes: 4
Reputation: 251398
yield element
can be executed if nested
is a list but sublist
is not (i.e., if nested
is a normal "flat" list). In this case, for sublist in nested
will work fine. When the next line recursively calls flatten sublist
a typerror will be raised when the recursive call tries to iterate over the "sublist" (which is not iterable). This TypeError is caught and the recursive call yields the entire input list back, so it is then iterated over by the for element in flatten(sublist)
call. In other words, for element in flatten(sublist)
winds up doing for element in sublist
if sublist is already flat.
The key thing to recognize is that even a non-nested list will result in a recursive call. A call like flatten([1])
will result in two yields: the recursive call will yield [1]
to the outer call, and the outer call immediately re-yields 1
.
This version of the function may help to understand what's going on:
def flatten(nested, indent=""):
try:
print indent, "Going to iterate over", nested
for sublist in nested:
print indent, "Going to iterate over flattening of", sublist
for element in flatten(sublist, indent+" "):
print indent, "Yielding", element
yield element
except TypeError:
print indent, "Type Error! Yielding", nested
yield nested
>>> list(flatten([[1,2],3]))
Going to iterate over [[1, 2], 3]
Going to iterate over flattening of [1, 2]
Going to iterate over [1, 2]
Going to iterate over flattening of 1
Going to iterate over 1
Type Error! Yielding 1
Yielding 1
Yielding 1
Going to iterate over flattening of 2
Going to iterate over 2
Type Error! Yielding 2
Yielding 2
Yielding 2
Going to iterate over flattening of 3
Going to iterate over 3
Type Error! Yielding 3
Yielding 3
[1, 2, 3]
Upvotes: 1
Reputation: 56654
I have added some instrumentation to the function:
def flatten(nested, depth=0):
try:
print("{}Iterate on {}".format(' '*depth, nested))
for sublist in nested:
for element in flatten(sublist, depth+1):
print("{}got back {}".format(' '*depth, element))
yield element
except TypeError:
print('{}not iterable - return {}'.format(' '*depth, nested))
yield nested
Now calling
list(flatten([[1,2],3]))
displays
Iterate on [[1, 2], 3]
Iterate on [1, 2]
Iterate on 1
not iterable - return 1
got back 1
got back 1
Iterate on 2
not iterable - return 2
got back 2
got back 2
Iterate on 3
not iterable - return 3
got back 3
Upvotes: 12
Reputation: 2804
the try
except
construction catches the exception for you and yields nested
back which is just the argument that was given to flatten()
.
So flatten(1) will go wrong in for sublist in nested:
and continues with the except
part and yields nested
which is 1
.
Upvotes: 1