finding indices of close elements in 2D arrays

Question

I have 2 nd arrays where each row is a 3D point and one array is much bigger than the other.
i.e

array([[1., 2., 3.],
       [2.01, 5., 1.],
       [3., 3., 4.],
       [1., 4., 1.],
       [3., 6., 7.01]])

array([[3.02, 3.01, 4.0],
      [1.01, 1.99, 3.01],
      [2.98, 6.01, 7.01]])

And I know each point in the second array correspond to a point in the first array.
I would like to get the list of indices of correspondence,
I.e for this example it would be

 array([2,0,4])

as the first point in the second array is similar to the third point in the first array, the second point in the second array is similar to the first point in the first array, etc.

hilberts_drinking_problem · Accepted Answer

You can do this efficiently with a KDTree.

import numpy as np
from scipy.spatial import KDTree

x = np.array([[1., 2., 3.],
       [2.01, 5., 1.],
       [3., 3., 4.],
       [1., 4., 1.],
       [3., 6., 7.01]])

y = np.array([[1.01, 1.99, 3.01],
       [3.02, 3.01, 4.0],
       [2.98, 6.01, 7.01]])

result = KDTree(x).query(y)[1]

# In [16]: result                                                        
# Out[16]: array([0, 2, 4])

Thanks to Divakar for pointing out that scipy also provides a C implementation of KDTree, called cKDTree. It is 10x faster for the following benchmark:

x = np.random.rand(100_000, 3)
y = np.random.rand(100, 3)

def benchmark(TreeClass):
    return TreeClass(x).query(y)[1]

In [23]: %timeit q.benchmark(KDTree)                                   
322 ms ± 7.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [24]: %timeit q.benchmark(cKDTree)                                  
36.5 ms ± 763 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

finding indices of close elements in 2D arrays

Answers (2)

Related Questions