Same iterator object yields different result in for loop?

Question

I came across a very strange behaviour in Python. Using a class derived from UserDict, the iterator a.items() behaves differently in a for loop than a.data.items(), even though the two are identical:

Python 3.3.1 (default, Apr 17 2013, 22:32:14) 
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datastruct import QueueDict
>>> a=QueueDict(maxsize=1700)
>>> for i in range(1000):
...     a[str(i)]=1/(i+1)
... 
>>> a.items()
ItemsView(OrderedDict([('991', 0.0010080645161290322), ('992', 0.0010070493454179255), ('993', 0.001006036217303823), ('994', 0.0010050251256281408), ('995', 0.001004016064257028), ('996', 0.0010030090270812437), ('997', 0.001002004008016032), ('998', 0.001001001001001001), ('999', 0.001)]))
>>> a.data.items()
ItemsView(OrderedDict([('991', 0.0010080645161290322), ('992', 0.0010070493454179255), ('993', 0.001006036217303823), ('994', 0.0010050251256281408), ('995', 0.001004016064257028), ('996', 0.0010030090270812437), ('997', 0.001002004008016032), ('998', 0.001001001001001001), ('999', 0.001)]))
>>> a.items()==a.data.items()
True
>>> # nevertheless:
... 
>>> for item in a.items(): print(item)
... 
('992', 0.0010070493454179255)
>>> for item in a.data.items(): print(item)
... 
('993', 0.001006036217303823)
('994', 0.0010050251256281408)
('995', 0.001004016064257028)
('996', 0.0010030090270812437)
('997', 0.001002004008016032)
('998', 0.001001001001001001)
('999', 0.001)
('991', 0.0010080645161290322)
('992', 0.0010070493454179255)
>>>

The class definition is as follows:

import collections, sys

class QueueDict(collections.UserDict):

    def __init__(self, maxsize=1*((2**10)**2), *args, **kwargs ):
        self._maxsize=maxsize
        super().__init__(*args, **kwargs)
        self.data=collections.OrderedDict(self.data)

    def __getitem__(self, key):
        self.data.move_to_end(key)
        return super().__getitem__(key)

    def __setitem__(self, key, value):
        super().__setitem__(key, value)
        self._purge()

    def _purge(self):
        while sys.getsizeof(self.data) > self._maxsize:
            self.data.popitem(last=False)

This is quite disturbing. Any ideas how the same object [by "visual" inspection, and also by (a.items()==a.data.items()) == True] can, and why it does, behave differently in the for loop?

Thanks for your help and ideas!

Joachim Isaksson · Accepted Answer

Changing a collection while iterating can have (and in this case has) some unexpected consequences.

Your getter;

def __getitem__(self, key):
    self.data.move_to_end(key)
    return super().__getitem__(key)

...moves the current key to the end of the collection, and that will make the for loop over a.items stop since it thinks it reached the end of the collection.

Commenting the move_to_end line allows the iteration to run as expected.

When you're iterating over a.data.items, your getter is never invoked so there it's not a problem.

Same iterator object yields different result in for loop?

Answers (1)

Related Questions