Leonardo
Leonardo

Reputation: 1891

Creating a non-iterator iterable

I was reading What exactly are iterator, iterable, and iteration? and Build a basic Python iterator when I realized I don't understand in practice how an iterable class must be implemented.

Say that I have the following class:

class MyClass():
    def __init__(self, num):
        self.num = num
        self.count = 0

    def __len__(self):
        return self.num

    def __iter__(self):
        return self

    def __next__(self):
        if self.count < self.num:
            v = self.count
            self.count += 1
            return v
        else:
            self.count = 0
            raise StopIteration

That class is iterable because it "has an __iter__ method which returns an iterator"*1. An object of MyClass are also iterators because "an iterator is an object with a next (Python 2) or __next__ (Python 3) method. "*1. So far so good.

What's confusing me is a comment that stated "iterators are only supposed to be iterated once"*2. I don't understand why the following snippet gets stuck forever:

>>> y = MyClass(5)
>>> print([[i for i in y] for i in y])

The fix, of course, is to not reset the count member:

    def __next__(self):
        if self.count < self.num:
            v = self.count
            self.count += 1
            return v
        else:
            raise StopIteration

But now the list comprehension has to create new objects in the inner loop:

>>> y = MyClass(5)
>>> print([[i for i in MyClass(5)] for i in y])
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]

Now, let's say that I want to be able to call my object many times. I tried to implement an non-iterator iterable class with:

class MyIterator():
    def __init__(self, num):
        self.num = num
        self.count = 0

    def __len__(self):
        return self.num

    def __iter__(self):
        return self.my_iterator()

    def my_iterator(self):
        while self.count < self.num:
            yield self.count
            self.count += 1
        self.count = 0

This works perfectly:

>>> x = MyIterator(5)
>>> print(list(x))
[0, 1, 2, 3, 4]
>>> print(list(x))
[0, 1, 2, 3, 4]

But the nested comprehension gets stuck:

>>> x = MyIterator(5)
>>> print([[i for i in x] for i in x])

And again the fix is to remove the line that resets the internal counter:

    def my_iterator(self):
        while self.count < self.num:
            yield self.count
            self.count += 1

And change the comprehension to create new objects in the inner loop:

>>> print([[i for i in MyIterator(5)] for i in x])
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]

But the "fixed" class can't be iterated over more than once:

>>> x = MyIterator(5)
>>> print(list(x))
[0, 1, 2, 3, 4]
>>> print(list(x))
[]

What's the correct way to implement an non-iterator iterable (note that I *think I followed the last comment in this answer to the letter)? Or is this use case explicitly not supported by Python?

Edit:

Classic case of rubber duck debugging, I changed the last class to:

class MyIteratorFixed():
    def __init__(self, num):
        self.num = num

    def __len__(self):
        return self.num

    def __iter__(self):
        return self.my_iterator_fixed()

    def my_iterator_fixed(self):
        count = 0
        while count < self.num:
            yield count
            count += 1

What I had wrong is that I didn't need a count member because Python already holds the state of the iterator method (in this particular case the value of count).

>>> x = MyIteratorFixed(5)
>>> print(list(x))
[0, 1, 2, 3, 4]
>>> print(list(x))
[0, 1, 2, 3, 4]
>>> print([[i for i in x] for i in x])
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]

My question are now:

  1. Is this the correct way to implement a non-iterator iterable?
  2. When should I use a iterator and when should I use a non-iterator iterable? Just the distinction of one of them being called just once?
  3. What are the drawbacks of a non-iterator iterable compared to an iterator?

Thanks!!

Upvotes: 3

Views: 979

Answers (3)

Leonardo
Leonardo

Reputation: 1891

My last iteration takes the hint from this answer

class MyIterator():
    def __init__(self, num):
        self.num = num

    def __iter__(self):
        count = 0
        while count < self.num:
            yield count
            count += 1

Upvotes: 0

fsimonjetz
fsimonjetz

Reputation: 5802

I figured a real life example of a non-iterator iterable might be helpful: I usually work with language data and often implement some kind of container class for documents that holds the words, sentences, parts-of-speech tags, syntactic information etc., but the central structure is usually some list of tokens:

class Document:
    def __init__(self, wordlist):
        self.tokens = wordlist

doc = Document(['Hello', 'World', '!'])

Whenever I need to iterate over the tokens, I could do for w in doc.tokens, but that's too cumbersome. So I would normally add __iter__ that returns the stored tokens as iterator:

class Document:
    def __init__(self):
        self.tokens = ['Hello', 'World', '!']
        
    def __iter__(self):
        return iter(self.words)

Now I can do for w in doc: which can be done unlimited times, and if the loop is broken in between, next time it will restart from the first word again, a behavior that seems quite natural to work with. But the object itself is not an iterator (because next() isn't implemented).

Upvotes: 0

chepner
chepner

Reputation: 530843

  1. Yes, this is correct.

  2. Usually, you want your iterator to be separate from the thing being iterated: it makes for a nice separation of concerns.

  3. There are few, if any, drawbacks. Most iterable classes in Python do not act as their own iterators. File-like objects (which wrap file descriptors that already maintain their own file pointer) are the only exceptions that come to mind. For example,

    >>> type(iter([]))
    <class 'list_iterator'>
    >>> type(iter(()))
    <class 'tuple_iterator'>
    >>> type(iter({}))
    <class 'dict_keyiterator'>
    >>> type(iter(set()))
    <class 'set_iterator'>
    

    None of the four types considered implement __iter__ by returning the object itself; they all return instances of a separate class.

Upvotes: 5

Related Questions