Shan
Shan

Reputation: 19243

how to match two numpy array of unequal length?

i have two 1D numpy arrays. The lengths are unequal. I want to make pairs (array1_elemnt,array2_element) of the elements which are close to each other. Lets consider following example

    a = [1,2,3,8,20,23]
    b = [1,2,3,5,7,21,35]

The expected result is

    [(1,1), 
    (2,2), 
    (3,3), 
    (8,7),
    (20,21),
    (23,25)]

It is important to note that 5 is left alone. It could easily be done by loops but I have very large arrays. I considered using nearest neighbor. But felt like killing a sparrow with a canon.

Can anybody please suggest any elegant solution.

Thanks a lot.

Upvotes: 0

Views: 3525

Answers (5)

pv.
pv.

Reputation: 35125

You can do the following:

a = np.array([1,2,3,8,20,23])
b = np.array([1,2,3,5,7,21,25])

def find_closest(a, sorted_b):
    j = np.searchsorted(.5*(sorted_b[1:] + sorted_b[:-1]), a, side='right')
    return b[j]

b.sort()  # or, b = np.sort(b), if you don't want to modify b in-place
print np.c_[a, find_closest(a, b)]

# ->
# array([[ 1,  1],
#        [ 2,  2],
#        [ 3,  3],
#        [ 8,  7],
#        [20, 21],
#        [23, 25]])

This should be pretty fast. How it works is that searchsorted will find for each number a the index into the b past the midpoint between two numbers, i.e., the closest number.

Upvotes: 0

mike
mike

Reputation: 23811

You could use the built in map function to vectorize a function that does this. For example:

ar1 = np.array([1,2,3,8,20,23])
ar2 = np.array([1,2,3,5,7,21,35])
def closest(ar1, ar2, iter):
    x = np.abs(ar1[iter] - ar2)
    index = np.where(x==x.min())
    value = ar2[index]
    return value

def find(x):
    return closest(ar1, ar2, x)
c = np.array(map(find, range(ar1.shape[0])))

In the example above, it looked like you wanted to exclude values once they had been paired. In that case, you could include a removal process in the first function like this, but be very careful about how array 1 is sorted:

 def closest(ar1, ar2, iter):
    x = np.abs(ar1[iter] - ar2)
    index = np.where(x==x.min())
    value = ar2[index]
    ar2[ar2==value] = -10000000
    return value

Upvotes: 1

steabert
steabert

Reputation: 6878

I think one can do it like this:

  1. create two new structured arrays, such that there is a second index which is 0 or 1 indicating to which array the value belongs, i.e. the key
  2. concatenate both arrays
  3. sort the united array along the first field (the values)
  4. use 2 stacks: go through the array putting elements with key 1 on the left stack, and when you cross an element with key 0, put them in the right stack. When you reach the second element with key 0, for the first with key 0 check the top and bottom of the left and right stacks and take the closest value (maybe with a maximum distance), switch stacks and continue.

sort should be slowest step and max total space for the stacks is n or m.

Upvotes: 0

HYRY
HYRY

Reputation: 97291

The best method I can think of is use a loop. If loop in python is slow, you can use Cython to speedup you code.

Upvotes: 0

Johannes Charra
Johannes Charra

Reputation: 29923

How about using the Needleman-Wunsch algorithm? :)

The scoring matrix would be trivial, as the "distance" between two numbers is just their difference.

But that will probably feel like killing a sparrow with a tank ...

Upvotes: 3

Related Questions