C_Z_
C_Z_

Reputation: 7796

Efficiently accessing indices: is storing indices in a dict the fastest way?

I have a list of tuples that I need to access the index of repeatedly. Is the fastest way to do this to store the tuples and their indices in a dict? Or is there a better way?

Given a tuple that exists in an ordered list of tuples, I need to access the index of it efficiently. I can think of two ways to do it; call .index on the list, or store the tuples as keys in a dictionary with the index as their value and look up the index by doing a dictionary lookup.

order = list(itertools.product(range(4),repeat=6))
%timeit order.index(random.choice(order))
# 10000 loops, best of 3: 35 µs per loop

order_dic = {i:order.index(i) for i in order}
%timeit order_dic[random.choice(order)]
# 1000000 loops, best of 3: 443 ns per loop

Upvotes: 2

Views: 279

Answers (2)

arekolek
arekolek

Reputation: 9620

Looks like it will be really hard to get any faster than a dict(). Just see how long random.choice() takes without any lookup:

>>> order = list(itertools.product(range(4),repeat=6))
>>> order_dic = {i:order.index(i) for i in order}

>>> %timeit order.index(random.choice(order))
>>> %timeit order_dic[random.choice(order)]
>>> %timeit random.choice(order)

10000 loops, best of 3: 98.6 µs per loop
1000000 loops, best of 3: 1.6 µs per loop
1000000 loops, best of 3: 1.39 µs per loop

So, the lookup consists mere 15% of the total you were measuring.

But maybe you could consider storing the index in the tuple? Or storing the tuple in another tuple, that also has the index? For example order = list(enumerate(order)).

You could assign the index to each tuple object:

order = list(itertools.product(range(4),repeat=6))
order_dic = {i:order.index(i) for i in order}

%timeit order.index(random.choice(order))
%timeit order_dic[random.choice(order)]
%timeit random.choice(order)

class mytuple(tuple): pass

order = list(map(mytuple, order))
for i, t in enumerate(order): t.rev_index = i

%timeit random.choice(order).rev_index

## -- End pasted text --
10000 loops, best of 3: 101 µs per loop
1000000 loops, best of 3: 1.59 µs per loop
1000000 loops, best of 3: 1.44 µs per loop
1000000 loops, best of 3: 1.53 µs per loop

It's faster than the hash map.

You can't set attributes of builtins dynamically, so we have to subclass it. We also can't use __slots__ because variable-size types like tuple don't support it, so every mytuple will have an additional __dict__. But apparently this still has a smaller memory footprint (almost twice) than using one dict to map all the indexes:

>>> from pympler.asizeof import asizeof
>>> asizeof(order_dic) \
... / (asizeof(order) - asizeof(list(itertools.product(range(4),repeat=6))))
1.8341168569509738

Upvotes: 0

Tuomas Laakkonen
Tuomas Laakkonen

Reputation: 1290

Almost always, the second option will be faster. This is because in the first example, you search the list in order to find the index, an O(n) operation on lists. Whereas in the second example, you lookup a value in the dictionary, which is O(1) on average.

However, space is something to consider here as the second option has the overhead of an additional dict. In Brandon Rhodes's talk at Pycon 2014, he says that dicts tend to use 3 to 4 times more space than they need, so that adding new elements is still fast, which could be a lot of overhead if you have a large list of tuples.

If list of tuples you use as an example is actually the list you are searching through, then there is a fast and memory-efficient way to calculate the index of a given tuple, relying on the way itertools.product works:

a[5] + a[4] >> 2 + a[3] >> 4 + a[2] >> 6 + a[1] >> 8 + a[0] >> 10

This will give the index of tuple a. This is nearly as fast as the second option but does not have the additional space requirement of an extra dict.

(The idea behind this method is that the tuple is essentially a six-digit base-four number, and since itertools.product outputs them in order, this will give the index.)

Upvotes: 2

Related Questions