Reputation: 69
I have a dictionary and I'd like to find elegant and effective way of finding a key:value
pair where the value is minimal across the dictionary (one of the minimal, if many exist). Apart from obvious for
loop approaches I've found several others in StackOverflow:
1st approach:
temporary = [x for x in myDictionary.items()] # list is created just for using sorted()
foundKey, minimalValue = sorted(temporary, key=lambda x: x[1]) [0]
2nd approach:
minimalValue = min(myDictionary.values())
foundKey = min(myDictionary, key=myDictionary.get)
The 2nd runs little bit faster for myDictionary of thousands items, but ...I cannot find explanation of the key=myDictionary.get
construct. Isn't it possible to join two min's into one foundKey, minimalValue = ...
?
Upvotes: 3
Views: 78
Reputation: 114230
The second approach can be rephrased better as
foundKey = min(myDictionary, key=myDictionary.get)
minValue = myDictionary[foundKey]
The method get
retrieves the value corresponding to the key being inspected so instead of comparing key1
, key2
, you are comparing myDictionary.get[key1]
, myDictionary.get[key2]
.
You could equally well use __getitem__
. It will likely be faster, but won't look as pretty:
foundKey = min(myDictionary, key=myDictionary.__getitem__)
By the way, the first approach has two possible improvements:
temporary = list(myDictionary.items())
foundKey, minimalValue = sorted(temporary, key=lambda x: x[1])[0]
OR
temporary = [x[::-1] for x in myDictionary.items()]
foundKey, minimalValue = min(temporary)
OR
foundKey, minimalValue = min(zip(myDictionary.values(), myDictionary.keys()))
Timing
Let's make a dictionary of size n
:
from random import shuffle
values = list(range(n))
shuffle(values)
myDictionary = dict(zip(map('{:08d}'.format, range(n)), values))
Timings for n=10000
:
%%timeit
... temporary = [x for x in myDictionary.items()]
... foundKey, minimalValue = sorted(temporary, key=lambda x: x[1])[0]
5.76 ms ± 32.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
... minimalValue = min(myDictionary.values())
... foundKey = min(myDictionary, key=myDictionary.get)
1.85 ms ± 3.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So clearly running min
(O(n)
) is faster than sorted
(O(n log n)
).
%%timeit
... foundKey = min(myDictionary, key=myDictionary.get)
... minValue = myDictionary[foundKey]
1.36 ms ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So running min
and doing a lookup is faster than running min
twice.
%timeit foundKey, minimalValue = min(zip(myDictionary.values(), myDictionary.keys()))
1.32 ms ± 6.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Running min
without a lookup is faster yet.
%%timeit
... foundKey = min(myDictionary, key=myDictionary.__getitem__)
... minValue = myDictionary[foundKey]
1.27 ms ± 2.77 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Using __getitem__
with a lookup is faster still.
TL;DR
It seems that the fastest approach of those shown here is
foundKey = min(myDictionary, key=myDictionary.__getitem__)
minValue = myDictionary[foundKey]
Upvotes: 4