Reputation: 331

Does python iterator cost additional memory?

I wonder if my_iter = iter(iterable_obj) copy the iterable_obj? In other words, does the above call cost additional memory?

Upvotes: 3

Answers (1)

Kelly Bundy

Reputation: 27629

Does it copy? Maybe. But shouldn't.

It could copy, but it shouldn't. It should just provide iteration over the existing data structure with minimal memory overhead as necessary for the iteration. For example a list iterator only stores a reference to the list as well as an index.

What does it do?

What does it do? That depends. The iter function shall provide an iterator over any possible iterable in the whole wide world, including an iterable class you'll only write tomorrow, with complicated internal data structure. How can iter possible do that? Artificial intelligence? Magic? No. Well ... actually yes, magic. Namely with so-called "magic methods" (or "dunder methods"). In this case, __iter__ or __getitem__. The trick is, iter doesn't know how to iterate the iterable. The iterable does. And makes the iteration accessible with one of those two magic methods. The iter function is just a simple middle man between the code that calls it (which wants the iteration) and the iterable (which provides the iteration).

Example with an __iter__ method returning an iterator:

class MyIterable:
    def __iter__(self):
        return iter('abcde')

print(list(MyIterable()))

Output:

['a', 'b', 'c', 'd', 'e']

Example with a __getitem__ method returning elements for indexes 0, 1, 2, etc (until IndexError):

class MyIterable:
    def __getitem__(self, index):
        return 'abcde'[index]

print(list(MyIterable()))

Output:

['a', 'b', 'c', 'd', 'e']

So what does iter(iterable) do? Depends on what the iterable does. It might copy, it might not, it might try to set your house on fire.

For something as simple as a list iterator, the choice is obvious: Using a reference to the list and an index to where the iterator stands is both simple and efficient.

More interesting case: Binary search tree iterator

Let's consider a case where it's not so obvious and where you might be tempted to copy: A binary search tree iterator that offers iteration over the tree's values in sorted order. Let's consider three possible implementations, where n is the number of values in the tree. The tree will be represented as a structure of BST node objects:

class BST:
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

Possible implementation 1: Recursive iterator

class BST:
    ...
    def __iter__(self):
        if self.left:
            yield from self.left
        yield self.value
        if self.right:
            yield from self.right

Advantages:

Easy code.
O(1) time and space for iterator creation.
Lazy, iterates only as much as then requested.
Only O(h) memory during iteration, where h is the height of the tree. Could be as low as Θ(log n) if the tree is balanced, or as high as Θ(n) if it's very unbalanced.

Disadvantages:

Slow. Every value gets passed through the entire stack of iterators up to the root. So iterating the whole tree takes at least Θ(n log n) and up to Θ(n²) time.

Possible implementation 2: Copy values into list

Since slow iteration, especially quadratic time, is seriously disappointing, we could copy all values from the tree into a list and return an iterator over that list:

class BST:
    ...
    def __iter__(self):
        values = []
        def collect(node):
            if node:
                collect(node.left)
                values.append(node.value)
                collect(node.right)
        collect(self)
        return iter(values)

Advantages:

Easy code.
Linear time iteration.

Disadvantages:

Θ(n) memory.
Θ(n) time already for creating the iterator, before even starting actual iteration.

Possible implementation 3: Iterative

Here's an iterative one using a stack. The stack will hold the nodes whose values and whose right subtrees still need to be iterated:

class BST:
    ...
    def __iter__(self):
        node = self
        stack = []
        while node or stack:
            while node:
                stack.append(node)
                node = node.left
            node = stack.pop()
            yield node.value
            node = node.right

Combines the advantages of the first two implementations (it's both time and memory efficient) but at the disadvantage of not being easy. Unlike for the first two implementations, I felt the need to add that little explanation for how it works, and you'll probably still need to think about it a bit if you haven't seen it before.

Conclusion

If it's just a little exercise for you and you don't have efficiency issues, the first two implementations are fine and easy to write. Although the one copying the values to a list isn't really a normal iterator, as copying the values is fundamentally not what iteration means. It's not about the memory, though. The recursive generator and the iterative approach take anywhere from O(log n) and O(n) memory as well, but that's organizational data and somewhat necessary to facilitate the iteration. They're not copying the content data.

If it's a BST package for serious use, then I'd find the disadvantages of the first two implementations unacceptable and would use the iterative implementation. More effort to write once, but with advantages and a proper iterator.

Btw if the nodes also had a reference to their parent node, an iterator could use that to do efficient iteration with O(1) memory. Left as exercise for the reader :-P

Code

BST code to play with (Try it online!):

from random import shuffle

class BST:

    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

    def insert(self, value):
        if value < self.value:
            if self.left:
                self.left.insert(value)
            else:
                self.left = BST(value)
        elif value > self.value:
            if self.right:
                self.right.insert(value)
            else:
                self.right = BST(value)

    def __repr__(self):
        return f'BST({self.value}, {self.left}, {self.right})'

    def __iter__(self):
        yield from self.left or ()
        yield self.value
        yield from self.right or ()

    def __iter__(self):
        values = []
        def collect(node):
            if node:
                collect(node.left)
                values.append(node.value)
                collect(node.right)
        collect(self)
        return iter(values)

    def __iter__(self):
        node = self
        stack = []
        while node or stack:
            while node:
                stack.append(node)
                node = node.left
            node = stack.pop()
            yield node.value
            node = node.right

# Build a random tree
values = list(range(20)) * 2
shuffle(values)
tree = BST(values[0])
for value in values[1:]:
   tree.insert(value)

# Show the tree
print(tree)

# Iterate the tree in sorted order
print(list(tree))

Sample output:

BST(1, BST(0, None, None), BST(17, BST(10, BST(6, BST(2, None, BST(4, BST(3, None, None), BST(5, None, None))), BST(9, BST(7, None, BST(8, None, None)), None)), BST(15, BST(11, None, BST(13, BST(12, None, None), BST(14, None, None))), BST(16, None, None))), BST(18, None, BST(19, None, None))))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

Upvotes: 1