RexFuzzle
RexFuzzle

Reputation: 1432

Finding closest values in two numpy arrays

The aim here is speed- I am trying to get away from looping through the arrays in question. It can however be assumed that the two arrays are sorted.

a = np.arange(10)
b = np.array([2.3, 3.5, 5.8, 13])
c = somefunc(a,b)

Now somefunc should find the indices of a for which the values in b are closest too, i.e.

In []: c
Out[]: array([2, 3or4, 6, 9])  #3 or 4 depending on python2 or 3

Once again, this could be done with a loop, but I am looking for something a lot faster. I got quite close by taking the absolute difference type approach, something like:

np.argmin(np.abs(a[:, np.newaxis] - b), axis=0)

But even this is a little slow as a lot of unnecessary subtractions are done.

Upvotes: 4

Views: 3828

Answers (3)

RexFuzzle
RexFuzzle

Reputation: 1432

So using the suggestion from @Eelco to use searchsorted, I came to the following which is quicker with a larger dataset than the np.argmin on the vector method:

def finder(a, b):
    dup = np.searchsorted(a, b)
    uni = np.unique(dup)
    uni = uni[uni < a.shape[0]]
    ret_b = np.zeros(uni.shape[0])
    for idx, val in enumerate(uni):
        bw = np.argmin(np.abs(a[val]-b[dup == val]))
        tt = dup == val
        ret_b[idx] = np.where(tt == True)[0][bw]
    return np.column_stack((uni, ret_b))

Upvotes: 2

Eelco Hoogendoorn
Eelco Hoogendoorn

Reputation: 10759

scipy.spatial.cKDTree is the simplest solution to this problem; vectorized and more than likely good enough for your application; but not theoretically optimal given that your data is sorted.

Alternatively, you can use numpy.searchsorted. Use this to find the left-or right sided insertion point, and then compare that point and the next one to find the closest point.

Upvotes: 0

codelessbugging
codelessbugging

Reputation: 2909

keep track of two pointers, one for the current index for a, and one for b. as we increment pointer a, we keep track of the min difference and index between the elements being pointed to, until pointed_a > pointed_b. update the min difference and index again (if there are changes). there we have our answer for the first element. continue the search similarly by increasing the pointer for b, and pick up pointer a from where we left off.

complexity: O( len a + len b ), therefore linear

Upvotes: 0

Related Questions