Reputation: 19243
i have two 1D numpy arrays. The lengths are unequal. I want to make pairs (array1_elemnt,array2_element) of the elements which are close to each other. Lets consider following example
a = [1,2,3,8,20,23]
b = [1,2,3,5,7,21,35]
The expected result is
[(1,1),
(2,2),
(3,3),
(8,7),
(20,21),
(23,25)]
It is important to note that 5 is left alone. It could easily be done by loops but I have very large arrays. I considered using nearest neighbor. But felt like killing a sparrow with a canon.
Can anybody please suggest any elegant solution.
Thanks a lot.
Upvotes: 0
Views: 3525
Reputation: 35125
You can do the following:
a = np.array([1,2,3,8,20,23]) b = np.array([1,2,3,5,7,21,25]) def find_closest(a, sorted_b): j = np.searchsorted(.5*(sorted_b[1:] + sorted_b[:-1]), a, side='right') return b[j] b.sort() # or, b = np.sort(b), if you don't want to modify b in-place print np.c_[a, find_closest(a, b)] # -> # array([[ 1, 1], # [ 2, 2], # [ 3, 3], # [ 8, 7], # [20, 21], # [23, 25]])
This should be pretty fast. How it works is that searchsorted
will find for each number a
the index into the b
past the midpoint between two numbers, i.e., the closest number.
Upvotes: 0
Reputation: 23811
You could use the built in map function to vectorize a function that does this. For example:
ar1 = np.array([1,2,3,8,20,23])
ar2 = np.array([1,2,3,5,7,21,35])
def closest(ar1, ar2, iter):
x = np.abs(ar1[iter] - ar2)
index = np.where(x==x.min())
value = ar2[index]
return value
def find(x):
return closest(ar1, ar2, x)
c = np.array(map(find, range(ar1.shape[0])))
In the example above, it looked like you wanted to exclude values once they had been paired. In that case, you could include a removal process in the first function like this, but be very careful about how array 1 is sorted:
def closest(ar1, ar2, iter):
x = np.abs(ar1[iter] - ar2)
index = np.where(x==x.min())
value = ar2[index]
ar2[ar2==value] = -10000000
return value
Upvotes: 1
Reputation: 6878
I think one can do it like this:
sort should be slowest step and max total space for the stacks is n or m.
Upvotes: 0
Reputation: 97291
The best method I can think of is use a loop. If loop in python is slow, you can use Cython to speedup you code.
Upvotes: 0
Reputation: 29923
How about using the Needleman-Wunsch algorithm? :)
The scoring matrix would be trivial, as the "distance" between two numbers is just their difference.
But that will probably feel like killing a sparrow with a tank ...
Upvotes: 3