Reputation: 331
I wonder if
my_iter = iter(iterable_obj)
copy the iterable_obj
?
In other words, does the above call cost additional memory?
Upvotes: 3
Views: 575
Reputation: 27629
It could copy, but it shouldn't. It should just provide iteration over the existing data structure with minimal memory overhead as necessary for the iteration. For example a list
iterator only stores a reference to the list as well as an index.
What does it do? That depends. The iter
function shall provide an iterator over any possible iterable in the whole wide world, including an iterable class you'll only write tomorrow, with complicated internal data structure. How can iter
possible do that? Artificial intelligence? Magic? No. Well ... actually yes, magic. Namely with so-called "magic methods" (or "dunder methods"). In this case, __iter__
or __getitem__
. The trick is, iter
doesn't know how to iterate the iterable. The iterable does. And makes the iteration accessible with one of those two magic methods. The iter
function is just a simple middle man between the code that calls it (which wants the iteration) and the iterable (which provides the iteration).
Example with an __iter__
method returning an iterator:
class MyIterable:
def __iter__(self):
return iter('abcde')
print(list(MyIterable()))
Output:
['a', 'b', 'c', 'd', 'e']
Example with a __getitem__
method returning elements for indexes 0, 1, 2, etc (until IndexError
):
class MyIterable:
def __getitem__(self, index):
return 'abcde'[index]
print(list(MyIterable()))
Output:
['a', 'b', 'c', 'd', 'e']
So what does iter(iterable)
do? Depends on what the iterable does. It might copy, it might not, it might try to set your house on fire.
For something as simple as a list iterator, the choice is obvious: Using a reference to the list and an index to where the iterator stands is both simple and efficient.
Let's consider a case where it's not so obvious and where you might be tempted to copy: A binary search tree iterator that offers iteration over the tree's values in sorted order. Let's consider three possible implementations, where n is the number of values in the tree. The tree will be represented as a structure of BST
node objects:
class BST:
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right
class BST:
...
def __iter__(self):
if self.left:
yield from self.left
yield self.value
if self.right:
yield from self.right
Advantages:
Disadvantages:
Since slow iteration, especially quadratic time, is seriously disappointing, we could copy all values from the tree into a list and return an iterator over that list:
class BST:
...
def __iter__(self):
values = []
def collect(node):
if node:
collect(node.left)
values.append(node.value)
collect(node.right)
collect(self)
return iter(values)
Advantages:
Disadvantages:
Here's an iterative one using a stack. The stack will hold the nodes whose values and whose right subtrees still need to be iterated:
class BST:
...
def __iter__(self):
node = self
stack = []
while node or stack:
while node:
stack.append(node)
node = node.left
node = stack.pop()
yield node.value
node = node.right
Combines the advantages of the first two implementations (it's both time and memory efficient) but at the disadvantage of not being easy. Unlike for the first two implementations, I felt the need to add that little explanation for how it works, and you'll probably still need to think about it a bit if you haven't seen it before.
If it's just a little exercise for you and you don't have efficiency issues, the first two implementations are fine and easy to write. Although the one copying the values to a list isn't really a normal iterator, as copying the values is fundamentally not what iteration means. It's not about the memory, though. The recursive generator and the iterative approach take anywhere from O(log n) and O(n) memory as well, but that's organizational data and somewhat necessary to facilitate the iteration. They're not copying the content data.
If it's a BST package for serious use, then I'd find the disadvantages of the first two implementations unacceptable and would use the iterative implementation. More effort to write once, but with advantages and a proper iterator.
Btw if the nodes also had a reference to their parent node, an iterator could use that to do efficient iteration with O(1) memory. Left as exercise for the reader :-P
BST code to play with (Try it online!):
from random import shuffle
class BST:
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right
def insert(self, value):
if value < self.value:
if self.left:
self.left.insert(value)
else:
self.left = BST(value)
elif value > self.value:
if self.right:
self.right.insert(value)
else:
self.right = BST(value)
def __repr__(self):
return f'BST({self.value}, {self.left}, {self.right})'
def __iter__(self):
yield from self.left or ()
yield self.value
yield from self.right or ()
def __iter__(self):
values = []
def collect(node):
if node:
collect(node.left)
values.append(node.value)
collect(node.right)
collect(self)
return iter(values)
def __iter__(self):
node = self
stack = []
while node or stack:
while node:
stack.append(node)
node = node.left
node = stack.pop()
yield node.value
node = node.right
# Build a random tree
values = list(range(20)) * 2
shuffle(values)
tree = BST(values[0])
for value in values[1:]:
tree.insert(value)
# Show the tree
print(tree)
# Iterate the tree in sorted order
print(list(tree))
Sample output:
BST(1, BST(0, None, None), BST(17, BST(10, BST(6, BST(2, None, BST(4, BST(3, None, None), BST(5, None, None))), BST(9, BST(7, None, BST(8, None, None)), None)), BST(15, BST(11, None, BST(13, BST(12, None, None), BST(14, None, None))), BST(16, None, None))), BST(18, None, BST(19, None, None))))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Upvotes: 1