mja
mja

Reputation: 69

What is efficiency of key=dictionary.get?

I have a dictionary and I'd like to find elegant and effective way of finding a key:value pair where the value is minimal across the dictionary (one of the minimal, if many exist). Apart from obvious for loop approaches I've found several others in StackOverflow:

1st approach:

  temporary = [x for x in myDictionary.items()] # list is created just for using sorted()
  foundKey, minimalValue = sorted(temporary, key=lambda x: x[1]) [0]

2nd approach:

  minimalValue = min(myDictionary.values())
  foundKey = min(myDictionary, key=myDictionary.get)

The 2nd runs little bit faster for myDictionary of thousands items, but ...I cannot find explanation of the key=myDictionary.get construct. Isn't it possible to join two min's into one foundKey, minimalValue = ... ?

Upvotes: 3

Views: 78

Answers (1)

Mad Physicist
Mad Physicist

Reputation: 114230

The second approach can be rephrased better as

foundKey = min(myDictionary, key=myDictionary.get)
minValue = myDictionary[foundKey]

The method get retrieves the value corresponding to the key being inspected so instead of comparing key1, key2, you are comparing myDictionary.get[key1], myDictionary.get[key2].

You could equally well use __getitem__. It will likely be faster, but won't look as pretty:

foundKey = min(myDictionary, key=myDictionary.__getitem__)

By the way, the first approach has two possible improvements:

temporary = list(myDictionary.items())
foundKey, minimalValue = sorted(temporary, key=lambda x: x[1])[0]

OR

temporary = [x[::-1] for x in myDictionary.items()]
foundKey, minimalValue = min(temporary)

OR

foundKey, minimalValue = min(zip(myDictionary.values(), myDictionary.keys()))

Timing

Let's make a dictionary of size n:

from random import shuffle

values = list(range(n))
shuffle(values)
myDictionary = dict(zip(map('{:08d}'.format, range(n)), values))

Timings for n=10000:

%%timeit
... temporary = [x for x in myDictionary.items()]
... foundKey, minimalValue = sorted(temporary, key=lambda x: x[1])[0]
5.76 ms ± 32.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
... minimalValue = min(myDictionary.values())
... foundKey = min(myDictionary, key=myDictionary.get)
1.85 ms ± 3.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So clearly running min (O(n)) is faster than sorted (O(n log n)).

%%timeit
... foundKey = min(myDictionary, key=myDictionary.get)
... minValue = myDictionary[foundKey]
1.36 ms ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So running min and doing a lookup is faster than running min twice.

%timeit foundKey, minimalValue = min(zip(myDictionary.values(), myDictionary.keys()))
1.32 ms ± 6.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Running min without a lookup is faster yet.

%%timeit
... foundKey = min(myDictionary, key=myDictionary.__getitem__)
... minValue = myDictionary[foundKey]
1.27 ms ± 2.77 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Using __getitem__ with a lookup is faster still.

TL;DR

It seems that the fastest approach of those shown here is

foundKey = min(myDictionary, key=myDictionary.__getitem__)
minValue = myDictionary[foundKey]

Upvotes: 4

Related Questions