Reputation: 46303
Let b
be a dict with some values :
b = {}
b[90, 1] = 100, b[90, 55] = 101, b[90, 127] = 102
b[70, 1] = 40, b[70, 45] = 41, b[70, 107] = 42
How to, in one pass, populate a dict
with its missing values as nearest neighbour, for example for 0 <= i <= 127
, 0 <= j <= 127
?
(it will give 16384 keys in the dict, so that's ok for my application).
As an example, I would like b[73, 40] = b[70, 45] = 41
, i.e.nearest neighboor in a 2D plane.
Here is what I tried :
for i in range(127):
for j in range(127):
closest_key = min(b.keys(), key=lambda c: (c[0] - i) ** 2 + (c[1] - j) ** 2)
b[i, j] = b[closest_key]
But it is very slow probably because there are 127*127 loops in which we loop again once over all elements to compute the distance!
How can we populate a dict with missing values, with nearest neighbour, in a more efficient way?
Upvotes: 2
Views: 226
Reputation: 20025
You are searching inside b
for the closest key. But b
doesn't contain only the original keys but also the new keys you are entering at each iteration. It will be faster and more correct to just check among the initial keys:
initial_keys = set(b.keys())
for i in xrange(127):
for j in xrange(127):
if (i, j) not in initial_keys:
closest_key = min(
initial_keys, key=lambda c: (c[0] - i) ** 2 + (c[1] - j) ** 2
)
b[i, j] = b[closest_key]
This way the running time of the algorithm drops to O(k * n^2)
from O(n^4)
, where n
is the dimension size and k
the number of initial keys.
EDIT:
You can use numpy
with great speedup improvement:
import numpy as np
s = set(b.keys())
x = np.array([k[0] for k in s])
y = np.array([k[1] for k in s])
for i in xrange(128):
for j in xrange(128):
if (i, j) not in s:
argmin = np.argmin((x - i) ** 2 + (y - j) ** 2)
b[i, j] = b[x[argmin], y[argmin]]
Upvotes: 2
Reputation: 23332
A dictionary is absolutely inappropriate for this kind of use - unless you're happy with O(n)
complexity (and then, using a list would be more clear). There is a class of hashing functions that could, conceivably, be used to implement a appropriate "dictionary" - but python's dict
is definitely not up to the task.
If you need proper performance, you'll need to use another data structure. The simplest one would be a K-d tree. There's an implementation inside scipy.
You may want to review the wikipedia article devoted to nearest neighbor search
Of course, you can use a dictionary as a cache if you're querying the same values repeatedly (as in Raydel Miranda's answer). But use it as a cache - not for storing/querying your actual data!
Upvotes: 1
Reputation: 14360
You could try to calculate on demand, and build a cache with results. The advantage of this approach is that if you don't need to use some point it will never be calculated.
Full example:
b = {}
b[90, 1] = 100
b[90, 55] = 101
b[90, 127] = 102
b[70, 1] = 40
b[70, 45] = 41
b[70, 107] = 42
class NeighbourDist:
def __init__(self, source_dict):
# Original dict.
self.__source_dict = source_dict
# Dict used for cache.
self.__cache_dict = {}
def __calculate_distance(self, x0, x1, y0, y1):
""" Calculate distance beetwen two points. """
dx = x1 - x0
dy = y1 - y0
d = (dx**2 + dy**2)**0.5
return d
def __getitem__(self, key):
"""
Look for the key in the cached dict, if not has been calculated yet
then proceed to calculate it.
Return the result and store in __cache_dict.
"""
cached = self.__cache_dict.get(key)
if cached is not None:
return cached
else:
x0, y0 = key
min_n = 0
min_ = 1e100
for (x1, y1) in self.__source_dict.keys():
dist = self.__calculate_distance(x0, x1, y0, y1)
if min_ > dist:
min_ = dist
min_n = self.__source_dict[x1, y1]
self.__cache_dict[key] = min_n
return min_n
if '__main__' == __name__:
d = NeighbourDist(b)
print(d[73, 40]) # >>> 41
print(d[73, 40]) # >>> 41, Second time the result is obtained from the cached dict.
Upvotes: 0