TimY
TimY

Reputation: 5416

Python: Nested dictionaries vs. tuple keys (calling former is much faster)

I noticed that, in many cases, retrieving data from nested dictionaries is much faster than from dictionaries with tuple keys. This seems to contradict what was said here. Can anyone please explain what causes this? My issue is that the faster way seems less "pythonic" and can lead to huge ugly nested dictionaries (e.g. a_dict[a][b][c][d] instead of a_dict[a,b,c,d]).

Below is an example where this occurs when using integers as keys:

Using tuples:

print timeit.timeit('''
for (int1, int2) in indexes:
    x[ int1, int2 ]
''', setup='''
x={}
indexes = []
for int1 in xrange(1000):
    for int2 in xrange(1000):
        x[ int1, int2 ] = 'asd'
        indexes.append((int1, int2))
''', number = 100)

Using nested dictionaries:

print timeit.timeit('''
for (int1, int2) in indexes:
    x[int1][ int2 ]
''', setup='''
x={}
indexes = []
for int1 in xrange(1000):
    x[int1] = {}
    for int2 in xrange(1000):
        x[ int1 ][int2] = 'asd'
        indexes.append((int1, int2))
''', number = 100)

Results:

36.8627537348
12.2223380257

This is a very substantial difference in my opinion.

Upvotes: 3

Views: 3653

Answers (2)

Bakuriu
Bakuriu

Reputation: 101919

First of all the two profilings you are doing aren't exactly correct. In the first case you are unpacking the keys and then repacking them to obtain the values.

The timings I get are:

In [3]: timeit.timeit('''
   ...: for key in indexes:
   ...:     x[key]
   ...: ''', setup='''
   ...: x={}
   ...: indexes = []
   ...: for a in range(1000):
   ...:     for b in range(1000):
   ...:         x[a,b] = 'asd'
   ...:         indexes.append((a, b))
   ...: ''', number=100)
Out[3]: 28.43928542699996
In [5]: timeit.timeit('''
   ...: for key in indexes:
   ...:     a,b = key
   ...:     x[a][b]
   ...: ''', setup='''
   ...: x={}
   ...: indexes = []
   ...: for a in range(1000):
   ...:     x[a] = {}
   ...:     for b in range(1000):
   ...:         x[a][b] = 'asd'
   ...:         indexes.append((a, b))
   ...: ''', number=100)
Out[5]: 10.23602621900045

(Using python3.3), the difference is already a bit smaller. Your benchmark showed a 3x difference, mine a 2.78x difference.

The difference in performance is due to the fact that:

  • Tuples take more time to hash. In fact integers hash to themselves(hash(1) -> 1), hence they take minimal time to hash, tuples have to compute the hash of all their elements and join them together.

  • Every time you access a dictionary key the dictionary has to check for equality of the keys, and comparing tuples is slower than comparing integers.

I'd like to point out that your benchmark doesn't make much sense. Why would you store all the keys in a list? Note that using the tuple-keys you can simply iterate over the dictionary, while in the nested case you have to use a nested loop. Also a nested dictionary will probably use more memory.

Before using a highly nested dictionary you must make sure that dictionary access is the bottleneck. And I doubt that it will the bottleneck, especially if you do anything other than accessing/storing items in the dictionary.

Nested sequences tend to be difficult to handle and you'll often need nested loops to handle them, which means more code and less maintainable code.

Upvotes: 2

sanyi
sanyi

Reputation: 6239

Seems that hashing tuples is kind of costly. More than creating a new dict and do integer lockups on insert.

Upvotes: 0

Related Questions