what are the rules for iterator invalidation?

Question

Is there a general rule that all modules follow?

In my case, I'm using Python xml.etree library

Let's say I do this:

for el in root.iter('*'):
    for subel in el:
        el.remove(subel)

Does that break the el iterator?

mgilson · Accepted Answer

There are lots of cases where modifying an iterable while iterating over it cause problems. Here's an example where an XML tree's iteration is messed up when an element is removed during iteration. There are plenty of questions on stackoverflow where you get surprising results when iterating over a list:

>>> lst = [1, 1, 2, 3]
>>> for item in lst:
...     if item == 1:
...         lst.remove(item)
... 
>>> print(lst)
[1, 2, 3]

(Note that there is still a 1 in the output list).

So the general rule is that you probably shouldn't do anything that would add or remove items while an iterator is doing it's thing. If you don't know how the iterator is implemented, this is by far the safest tack. However, some iterators are documented to work in specific ways. e.g. take the list example above, it turns out that we can remove the current (or elements at lower indices) if we iterate the list in reverse:

>>> lst = [1, 1, 2, 3]
>>> for item in reversed(lst):
...     if item == 1:
...         lst.remove(item)
... 
>>> print(lst)
[2, 3]

This is due to certain guarantees that are made by the list iterator. Note that due to the the general rule I listed above, I wouldn't advise doing this (It'll probably cause your code readers to scratch their heads to try to figure out why you're iterating over it backwards).

For the list case, you'll see people iterating over a copy of a list if they're planning on removing/adding elements, but it's harder to give advice for the general iterator case without knowing more about the constraints of the problem.

what are the rules for iterator invalidation?

Answers (1)

Related Questions