Reputation: 853
I need some clarification on what is going on with the following behavior when using a generator. I'm afraid I am missing something fundamental, so any advice is welcome.
(edit) Specifically my question deals with the del
of the iterable that an iterator is created on.
My ultimate use case is that I am iterating over a pretty massive corpus for text processing. It isn't so large that it can't be held in memory, but is big enough that training a subsequent model is impossible with it in memory.
So, in my investigation, I attempted the following and I'm confused how this works.
>>> iterable = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
>>> iterable.__sizeof__()
160
>>> iterator = (x+1 for x in iterable)
>>> iterator
<generator object <genexpr> at 0x1019f8570>
>>> iterator.__sizeof__()
64
>>> del iterable
>>> for i in iterator:
... print(i)
...
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Once I delete the iterable, what is the iterator referencing? How is it still able to have a smaller mem footprint but also succesfully execute (I was under the impression that the generator simply points to existing data, but if it is deleted I have to shrug?)
What am I missing? (something clearly. Sorry. I'm self taught). Thanks in advance!
Upvotes: 2
Views: 522
Reputation: 70602
I'd make this a comment, but it needs formatting to be clear:
>>> x = [1, 2, 3]
>>> y = [x]
>>> del x
>>> y
[[1, 2, 3]]
At heart it's the same thing as your example: removing the binding for name x
has no effect at all on the value of y
, because y
captured the object x
was bound to at the time y
was bound.
In the same way, the generator expression you bound to iterator
captured the then-current binding of iterable
at the time iterator
was bound.
However, this isn't entirely straightforward in all cases. You can read the PEP that introduced generator expressions for details. See especially the "Early Binding versus Late Binding" section.
Only the outermost for-expression is evaluated immediately, the other expressions are deferred until the generator is run
You "got lucky" here (but in a very common way) because iterable
was in "the outermost for-expression" of your generator expression. That's why the object iterable
was bound to was captured at the time you created the genexp.
Upvotes: 9