Reputation: 107092
Is iterating over some_dict.items()
as efficient as iterating over a list of the same items in CPython?
Upvotes: 16
Views: 27146
Reputation: 25
Although iterating through some_list
is 2x speedup than some_dict.items()
, but iterating through some_list
by index is almost as same as iterating through some_dict
by key.
K = 1000000
some_dict = dict(zip(xrange(K), reversed(xrange(K))))
some_list = zip(xrange(K), xrange(K))
%timeit for t in some_list: t
10 loops, best of 3: 55.7 ms per loop
%timeit for i in xrange(len(some_list)):some_list[i]
10 loops, best of 3: 94 ms per loop
%timeit for key in some_dict: some_dict[key]
10 loops, best of 3: 115 ms per loop
%timeit for i,t in enumerate(some_list): t
10 loops, best of 3: 103 ms per loop
Upvotes: 0
Reputation: 150957
It depends on which version of Python you're using. In Python 2, some_dict.items()
creates a new list, which takes up some additional time and uses up additional memory. On the other hand, once the list is created, it's a list, and so should have identical performance characteristics after the overhead of list creation is complete.
In Python 3, some_dict.items()
creates a view object instead of a list, and I anticipate that creating and iterating over items()
would be faster than in Python 2, since nothing has to be copied. But I also anticipate that iterating over an already-created view would be a bit slower than iterating over an already-created list, because dictionary data is stored somewhat sparsely, and I believe there's no good way for python to avoid iterating over every bin in the dictionary -- even the empty ones.
In Python 2, some timings confirm my intuitions:
>>> some_dict = dict(zip(xrange(1000), reversed(xrange(1000))))
>>> some_list = zip(xrange(1000), xrange(1000))
>>> %timeit for t in some_list: t
10000 loops, best of 3: 25.6 us per loop
>>> %timeit for t in some_dict.items(): t
10000 loops, best of 3: 57.3 us per loop
Iterating over the items
is roughly twice as slow. Using iteritems
is a tad bit faster...
>>> %timeit for t in some_dict.iteritems(): t
10000 loops, best of 3: 41.3 us per loop
But iterating over the list itself is basically the same as iterating over any other list:
>>> some_dict_list = some_dict.items()
>>> %timeit for t in some_dict_list: t
10000 loops, best of 3: 26.1 us per loop
Python 3 can create and iterate over items
faster than Python 2 can (compare to 57.3 us above):
>>> some_dict = dict(zip(range(1000), reversed(range(1000))))
>>> %timeit for t in some_dict.items(): t
10000 loops, best of 3: 33.4 us per loop
But the time to create a view is negligable; it is actually slower to iterate over than a list.
>>> some_list = list(zip(range(1000), reversed(range(1000))))
>>> some_dict_view = some_dict.items()
>>> %timeit for t in some_list: t
10000 loops, best of 3: 18.6 us per loop
>>> %timeit for t in some_dict_view: t
10000 loops, best of 3: 33.3 us per loop
This means that in Python 3, if you want to iterate many times over the items in a dictionary, and performance is critical, you can get a 30% speedup by caching the view as a list.
>>> some_list = list(some_dict_view)
>>> %timeit for t in some_list: t
100000 loops, best of 3: 18.6 us per loop
Upvotes: 30
Reputation: 80021
A little benchmark shows me that iterating a list is definately faster.
def iterlist(list_):
i = 0
for _ in list_:
i += 1
return i
def iterdict(dict_):
i = 0
for _ in dict_.iteritems():
i += 1
return i
def noiterdict(dict_):
i = 0
for _ in dict_.items():
i += 1
return i
list_ = range(1000000)
dict_ = dict(zip(range(1000000), range(1000000)))
Tested with IPython on Python 2.7 (Kubuntu):
%timeit iterlist(list_)
10 loops, best of 3: 28.5 ms per loop
%timeit iterdict(dict_)
10 loops, best of 3: 39.7 ms per loop
%timeit noiterdict(dict_)
10 loops, best of 3: 86.1 ms per loop
Upvotes: 8