Jsevillamol
Jsevillamol

Reputation: 2543

Complexity of iterating over a dictionary in Python

This is a fairly simple question I have not been able to find an answer for. If I have a dictionary, what is the complexity of iterating over it?

In other words, what is the time complexity of a dictionary traversal such as for key in my_dict: print(key)?

My naive understanding is that since dictionaries in Python are hashmaps, we would need to iterate over all possible hash values of the dictionary.

That seems like quite an overkill, but maybe is fine since the dictionary gets progressively bigger as we add elements, so we amortize the cost by always having a dictionary which is almost full up to a constant load factor?

Upvotes: 1

Views: 5416

Answers (1)

kaya3
kaya3

Reputation: 51037

In most cases, iterating over a dictionary takes O(n) time in total, or on average O(1) time per element, where n is the number of items in the dictionary.

There are various different versions of Python's dictionary data structure, depending on which version of Python you're using, but all of them are some kind of hashtable. A hashtable either has an array of key/value pairs, or an array of keys and a parallel array of values. Usually, a fixed proportion (known as the load factor) of the array will contain dictionary items, with the rest of the spaces remaining empty, so the length of the array you need to iterate over is a fixed constant times the number of dictionary items. This means you can iterate in O(n) time.

In recent versions of Python, the dictionary data structure's array simply holds indices to each item in another array, where the items in the other array are kept in insertion order. This additional array can be used to iterate over the dictionary in insertion order, still in O(n) time, but without having to skip over unused spaces in the lookup array.

Note that either way, we do not actually need to compute the hashes of any keys in order to iterate over the dictionary's items.


All of that said, there are some cases where iterating over a dictionary can take more than O(n) time. The reason for this is that although the capacity of the hashtable is enlarged when necessary to insert more items, it is not shrunk when items are removed. (Thanks to @HeapOverflow for pointing this out in the comments.)

If many items have been removed, then the proportion of dictionary items to the array capacity could be much smaller than the load factor. In that case, the array can be larger than a fixed constant times the number of items, so iterating takes more than O(n) time.

This is even true for the data structure used in more recent versions, which iterates using the additional array instead of the lookup array. When items are removed, they are simply replaced with NULL (CPython source); presumably this is done to allow removal in O(1) time while maintaining insertion order. Consequently, the additional array can also be longer than O(n) if many items are removed.

In most applications it is not common to remove a lot of items from a dictionary; if you need to do this and are concerned about iterating over those dictionaries efficiently, consider instead building a new dictionary using just the keys you need to keep, instead of removing them from the existing dictionary.

Upvotes: 5

Related Questions