Nitin
Nitin

Reputation: 2874

efficient way to map values between two lists in python

I have two lists. The first list is a coordinate pair list

[[x1, y1]
 [x2, y2]
 ...
 [xn, yn]]

The second list is a coordinate pair list along with a value associated with each pair

[[x1',y1',v1']
 [x2',y2',v2']
 ...
 [xn',yn',vn']]

I want to find the closest (x',y') in second list for each pair (x,y) in the first list and then map the value v' to (x,y).

My current solution is to loop through both lists and compute euclidean distance between every possible coordinate pairs and map to the minimum distance. But the original second list has 3 million entries! Is there a more efficient way to achieve this? Thanks.

Upvotes: 2

Views: 836

Answers (2)

tobias_k
tobias_k

Reputation: 82889

You could create a spatial map, mapping "areas" to points from your second list that are in this area, for example you can map 4-tuples of x-min, x-max, y-min, y-max to points, like this:

{(0, 10, 0, 10): [(2, 4, 5), (5, 1, 12), ...], (0, 10, 10, 20): [(4, 14, -1), ...] }

Now you can pick the corresponding area for your point from the first list, e.g., if the point is (24, 13), pick the list corresponding to area (20, 30, 10, 20). Of course, the size of those areas could be different depending on the distribution of the points.

If there are any points in this area, pick the one with smallest distance to the original point; otherwise look at the next "layer" of eight areas around that area, and so forth. Once you've found a point, you should expand one more layer, since there could still be a point in those layers closer to the original point. (see Figure)

enter image description here

Here, the red dot is the point from the first list, and the blue dots are those from the second list. The boxes correspond to the areas in the map. Although there are two dots in the area directly corresponding to the original points, there are points in the next "layer" that are closer, but you won't have to search further out.

Upvotes: 1

Aaron Digulla
Aaron Digulla

Reputation: 328556

Try to round the coordinates in the second list. That will give you lots of small clusters if coordinates. Use the rounded coordinate as a key in a dict with the list of coordinates+values as value.

for x,y in first_list:
    x_,y_ = round(x,y)
    l = d[(x_,y_)]
    ... find closest point in l...

With this algorithm, you only need to check a few points.

If the list is empty, you can retry using a more loose rounding.

Upvotes: 1

Related Questions