Andreas
Andreas

Reputation: 173

Finding closest key in defaultdict

I am using defaultdicts to store lists of values where the keys are periods for which values could be observed. When looking up from a list of all periods of interest, I would like to find the closest period in my defaultdict (NB: not all periods are stored in the defaultdict).

As defaultdicts are not sorted however, the below approach does not return the correct value.

Is there a different way of returning the closest available key for defaultdicts?

from collections import defaultdict
import numpy as np

def_dict = defaultdict(list)
# entries that will be stored in the defaultdict
reg_dict = {0: ["a", "b"], 2: ["c", "d"], 5: ["k", "h"], -3: ["i", "l"]}

# store items from regular dict in defaultdict 
for k, v in reg_dict.items():
    def_dict[k] = v

# Lookup periods
periods = [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8]

for period in periods:

    # this approach does not return the right keys as defaultdicts are not sorted
    closest_key = np.abs(np.array(list(def_dict.keys())) - period).argmin()

    print("period: ", period, " - looked up key: ", closest_key)

This returns the following:

period:  -1  - looked up key:  0
period:  0  - looked up key:  0
period:  1  - looked up key:  0
period:  2  - looked up key:  1
period:  3  - looked up key:  1
period:  4  - looked up key:  2
period:  5  - looked up key:  2
period:  6  - looked up key:  2
period:  7  - looked up key:  2
period:  8  - looked up key:  2

Upvotes: 0

Views: 138

Answers (4)

PM 2Ring
PM 2Ring

Reputation: 55489

As Eric said, to do this efficiently you should be using a binary search. However, if the number of keys is small the simple linear search may be adequate. There's no need to use a defaultdict or an OrderedDict, just sort the keys.

import numpy as np

# entries
reg_dict = {0: ["a", "b"], 2: ["c", "d"], 5: ["k", "h"], -3: ["i", "l"]}

keys = np.array(sorted(reg_dict.keys()))
print('keys', keys)

# Lookup periods
periods = np.arange(-1, 9)

for period in periods:
    closest_key = keys[np.abs(keys - period).argmin()]
    print("period: ", period, " - looked up key: ", closest_key)

output

keys [-3  0  2  5]
period:  -1  - looked up key:  0
period:  0  - looked up key:  0
period:  1  - looked up key:  0
period:  2  - looked up key:  2
period:  3  - looked up key:  2
period:  4  - looked up key:  5
period:  5  - looked up key:  5
period:  6  - looked up key:  5
period:  7  - looked up key:  5
period:  8  - looked up key:  5

Upvotes: 1

Eric Duminil
Eric Duminil

Reputation: 54263

With an OrderedDict and sorted keys, you could use a binary search. For large amount of keys, lookup would be much faster than your current method.

Since you want the closest key, you'd need to find both the rightmost key lower than x and the leftmost key higher than x. After finding the index i for the rightmost key lower than x, the other candidate (leftmost key higher than x) will be on index i+1.

You need to make sure that those indices are still in your array.

Finally, you just need to compute the distance to x from those 2 values.

Here's the doc for bisect and np.searchsorted

Upvotes: 2

Piotr Kamoda
Piotr Kamoda

Reputation: 1006

I agree with the @septra that you need euqlidean distance, but that's achievable with numpy as well:

from collections import defaultdict
import numpy as np

def_dict = defaultdict(list)
# entries that will be stored in the defaultdict
reg_dict = {0: ["a", "b"], 2: ["c", "d"], 5: ["k", "h"], -3: ["i", "l"]}

# store items from regular dict in defaultdict 
for k, v in reg_dict.items():
    def_dict[k] = v

periods = [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8]
a = list(def_dict.keys())
for period in periods:
    closest_key  = np.sqrt(np.power(np.add(a, -period),2)).argmin()
    # OR closest_key  = np.abs(np.add(a, -period)).argmin()

    print("period: ", period, " - looked up key: ", a[closest_key])

Upvotes: 1

zenofsahil
zenofsahil

Reputation: 1753

The way I understand, you want an output similar to this?

[0, 0, 0, 2, 2, 5, 5, 5, 5, 5]

For the above, the logic would be

closest_key = [min(def_dict.keys(), key = lambda x: abs(x - p)) for p in periods]

Specifying the optional key parameter to built in python functions is useful in situations like these.

Upvotes: 1

Related Questions