Given a dict iterator, get the dict

Question

Given a list iterator, you can find the original list via the pickle protocol:

>>> L = [1, 2, 3]
>>> Li = iter(L)
>>> Li.__reduce__()[1][0] is L
True

Given a dict iterator, how can you find the original dict? I could only find a hacky way using CPython implementation details (via garbage collector):

>>> def get_dict(dict_iterator): 
...     [d] = gc.get_referents(dict_iterator) 
...     return d 
...
>>> d = {}
>>> get_dict(iter(d)) is d
True

Martijn Pieters · Accepted Answer

There is no API to find the source iterable object from an iterator. This is intentional, iterators are seen as single-use objects; iterate and discard. A such, they often drop their iterable reference once they have reached the end; what's the point of keeping it if you can't get more elements, anyway?

You see this in both the list and dict iterators, the hacks you found either produce empty objects or None once you are done iterating. List iterators use an empty list when pickled:

>>> l = [1]
>>> it = iter(l)
>>> it.__reduce__()[1][0] is l
True
>>> list(it)  # exhaust the iterator
[1]
>>> it.__reduce__()[1][0] is l
False
>>> it.__reduce__()[1][0]
[]

and the dictionary iterator just sets the pointer to the original dictionary to null, so there are no referents left after that:

>>> import gc
>>> it = iter({'foo': 42})
>>> gc.get_referents(it)
[{'foo': 42}]
>>> list(it)
['foo']
>>> gc.get_referents(it)
[]

Both your hacks are just that: hacks. They are implementation dependent and can and probably will change between Python releases. Currently, using iter(dictionary).__reduce__() gets you the equivalent of iter, list(copy(self)) and rather than access to the dictionary because that's deemed a better implementation, but future versions might use something different altogether, etc.

For dictionaries, the only other option currently available is to access the di_dict pointer in the dictiter struct, with ctypes:

import ctypes

class PyObject_HEAD(ctypes.Structure):
    _fields_ = [
        ("ob_refcnt", ctypes.c_ssize_t),
        ("ob_type", ctypes.c_void_p),
    ]

class dictiterobject(ctypes.Structure):
    _fields_ = [
        ("ob_base", PyObject_HEAD),
        ("di_dict", ctypes.py_object),
        ("di_used", ctypes.c_ssize_t),
        ("di_pos", ctypes.c_ssize_t),
        ("di_result", ctypes.py_object),  # always NULL for dictkeys_iter
        ("len", ctypes.c_ssize_t),
    ]

def dict_from_dictiter(it):
    di = dictiterobject.from_address(id(it))
    try:
        return di.di_dict
    except ValueError:  # null pointer
        return None

This is just as much of a hack as relying on gc.get_referents():

>>> d = {'foo': 42}
>>> it = iter(d)
>>> dict_from_dictiter(it)
{'foo': 42}
>>> dict_from_dictiter(it) is d
True
>>> list(it)
['foo']
>>> dict_from_dictiter(it) is None
True

For now, at least in CPython versions up to and including Python 3.8, there are no other options available.

Given a dict iterator, get the dict

Answers (1)

Related Questions