Reputation: 1891
I was reading What exactly are iterator, iterable, and iteration? and Build a basic Python iterator when I realized I don't understand in practice how an iterable class must be implemented.
Say that I have the following class:
class MyClass():
def __init__(self, num):
self.num = num
self.count = 0
def __len__(self):
return self.num
def __iter__(self):
return self
def __next__(self):
if self.count < self.num:
v = self.count
self.count += 1
return v
else:
self.count = 0
raise StopIteration
That class is iterable because it "has an __iter__
method which returns an iterator"*1. An object of MyClass
are also iterators because "an iterator is an object with a next
(Python 2) or __next__
(Python 3) method. "*1. So far so good.
What's confusing me is a comment that stated "iterators are only supposed to be iterated once"*2. I don't understand why the following snippet gets stuck forever:
>>> y = MyClass(5)
>>> print([[i for i in y] for i in y])
The fix, of course, is to not reset the count
member:
def __next__(self):
if self.count < self.num:
v = self.count
self.count += 1
return v
else:
raise StopIteration
But now the list comprehension has to create new objects in the inner loop:
>>> y = MyClass(5)
>>> print([[i for i in MyClass(5)] for i in y])
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
Now, let's say that I want to be able to call my object many times. I tried to implement an non-iterator iterable class with:
class MyIterator():
def __init__(self, num):
self.num = num
self.count = 0
def __len__(self):
return self.num
def __iter__(self):
return self.my_iterator()
def my_iterator(self):
while self.count < self.num:
yield self.count
self.count += 1
self.count = 0
This works perfectly:
>>> x = MyIterator(5)
>>> print(list(x))
[0, 1, 2, 3, 4]
>>> print(list(x))
[0, 1, 2, 3, 4]
But the nested comprehension gets stuck:
>>> x = MyIterator(5)
>>> print([[i for i in x] for i in x])
And again the fix is to remove the line that resets the internal counter:
def my_iterator(self):
while self.count < self.num:
yield self.count
self.count += 1
And change the comprehension to create new objects in the inner loop:
>>> print([[i for i in MyIterator(5)] for i in x])
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
But the "fixed" class can't be iterated over more than once:
>>> x = MyIterator(5)
>>> print(list(x))
[0, 1, 2, 3, 4]
>>> print(list(x))
[]
What's the correct way to implement an non-iterator iterable (note that I *think I followed the last comment in this answer to the letter)? Or is this use case explicitly not supported by Python?
Edit:
Classic case of rubber duck debugging, I changed the last class to:
class MyIteratorFixed():
def __init__(self, num):
self.num = num
def __len__(self):
return self.num
def __iter__(self):
return self.my_iterator_fixed()
def my_iterator_fixed(self):
count = 0
while count < self.num:
yield count
count += 1
What I had wrong is that I didn't need a count
member because Python already holds the state of the iterator method (in this particular case the value of count
).
>>> x = MyIteratorFixed(5)
>>> print(list(x))
[0, 1, 2, 3, 4]
>>> print(list(x))
[0, 1, 2, 3, 4]
>>> print([[i for i in x] for i in x])
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
My question are now:
Thanks!!
Upvotes: 3
Views: 979
Reputation: 1891
My last iteration takes the hint from this answer
class MyIterator():
def __init__(self, num):
self.num = num
def __iter__(self):
count = 0
while count < self.num:
yield count
count += 1
Upvotes: 0
Reputation: 5802
I figured a real life example of a non-iterator iterable might be helpful: I usually work with language data and often implement some kind of container class for documents that holds the words, sentences, parts-of-speech tags, syntactic information etc., but the central structure is usually some list of tokens:
class Document:
def __init__(self, wordlist):
self.tokens = wordlist
doc = Document(['Hello', 'World', '!'])
Whenever I need to iterate over the tokens, I could do for w in doc.tokens
, but that's too cumbersome. So I would normally add __iter__
that returns the stored tokens as iterator:
class Document:
def __init__(self):
self.tokens = ['Hello', 'World', '!']
def __iter__(self):
return iter(self.words)
Now I can do for w in doc:
which can be done unlimited times, and if the loop is broken in between, next time it will restart from the first word again, a behavior that seems quite natural to work with. But the object itself is not an iterator (because next()
isn't implemented).
Upvotes: 0
Reputation: 530843
Yes, this is correct.
Usually, you want your iterator to be separate from the thing being iterated: it makes for a nice separation of concerns.
There are few, if any, drawbacks. Most iterable classes in Python do not act as their own iterators. File-like objects (which wrap file descriptors that already maintain their own file pointer) are the only exceptions that come to mind. For example,
>>> type(iter([]))
<class 'list_iterator'>
>>> type(iter(()))
<class 'tuple_iterator'>
>>> type(iter({}))
<class 'dict_keyiterator'>
>>> type(iter(set()))
<class 'set_iterator'>
None of the four types considered implement __iter__
by returning the object itself; they all return instances of a separate class.
Upvotes: 5