Fnord
Fnord

Reputation: 5895

Find the indices of the lowest closest neighbors between two lists in python

Given 2 numpy arrays of unequal size: A (a presorted dataset) and B (a list of query values). I want to find the closest "lower" neighbor in array A to each element of array B. Example code below:

import numpy as np

A = np.array([0.456, 2.0, 2.948, 3.0, 7.0, 12.132]) #pre-sorted dataset
B = np.array([1.1, 1.9, 2.1, 5.0, 7.0]) #query values, not necessarily sorted
print A.searchsorted(B)
# RESULT:  [1 1 2 4 4]
# DESIRED: [0 0 1 3 4]

In this example, B[0]'s closest neighbors are A[0] and A[1]. It is closest to A[1], which is why searchsorted returns index 1 as a match, but what i want is the lower neighbor at index 0. Same for B[1:4], and B[4] should be matched with A[4] because both values are identical.

I could do something clunky like this:

desired = []
for b in B:
    id = -1
    for a in A:
        if a > b:
            if id == -1:
                desired.append(0)
            else:
                desired.append(id)
            break

        id+=1

print desired
# RESULT: [0, 0, 1, 3, 4]

But there's gotta be a prettier more concise way to write this with numpy. I'd like to keep my solution in numpy because i'm dealing with large data sets, but i'm open to other options.

Upvotes: 2

Views: 1169

Answers (2)

Divakar
Divakar

Reputation: 221564

You can introduce the optional argument side and set it to 'right' as mentioned in the docs. Then, subtract the final indices result by 1 for the desired output, like so -

A.searchsorted(B,side='right')-1

Sample run -

In [63]: A
Out[63]: array([  0.456,   2.   ,   2.948,   3.   ,   7.   ,  12.132])

In [64]: B
Out[64]: array([ 1.1,  1.9,  2.1,  5. ,  7. ])

In [65]: A.searchsorted(B,side='right')-1
Out[65]: array([0, 0, 1, 3, 4])

In [66]: A.searchsorted(A,side='right')-1 # With itself
Out[66]: array([0, 1, 2, 3, 4, 5])

Upvotes: 3

Tobias
Tobias

Reputation: 514

Here's one way to do this. np.argmax stops at the first True it encounters, so as long as A is sorted this provides the desired result.

[np.argmax(A>b)-1 for b in B]

Edit: I got the inequality wrong initially, it works now.

Upvotes: 1

Related Questions