Reputation: 918
I have two arrays A
and B
. Let them both be one-dimensional for now.
For each element in A
I need the index of the element in B
that best matches the element in A
.
I can solve this using a list expression
import numpy as np
A = np.array([ 1, 3, 1, 5 ])
B = np.array([ 1.1, 2.1, 3.1, 4.1, 5.1, 6.1 ])
indices = np.array([ np.argmin(np.abs(B-a)) for a in A ])
print(indices) # prints [0 2 0 4]
print(B[indices]) # prints [1.1 3.1 1.1 5.1]
but this method is really slow for huge arrays.
I am wondering if there is a faster way utilizing optimized numpy functions.
Upvotes: 5
Views: 673
Reputation: 1815
Other than already given answer using broadcast, there is this one which does internal broadcast.
I wanted to keep my other answer separate as it uses other than numpy
np.argmin(np.abs(np.subtract.outer(A, B)), axis=1)
You can call outer on many of ufuncs.
Upvotes: 0
Reputation: 1815
Broadcasting can come to bite you back(tmp array creation will be included in time also), below method does not use lot of tmp memory so is memory efficient. Here is reference when broadcasting slow down due to too much memory usage
Keeping this for reference here. Other than that, you can write custom functions in cython numpy. Cython use different optimization compared to numba. So there is need to experiment which one optimize better. But for numba you can stay in python and write c like code
import numpy as np
import numba as nb
A = np.array([ 1, 3, 1, 5 ], dtype=np.float64)
B = np.array([ 1.1, 2.1, 3.1, 4.1, 5.1, 6.1 ], dtype=np.float64)
# Convert to fast optimized machine code
@nb.njit(
# Signature of Input
(nb.float64[:], nb.float64[:]),
# Optional
parallel=True
)
def less_mem_ver(A, B):
arg_mins = np.empty(A.shape, dtype=np.int64)
# nb.prange is for parallel=True
# what can be parallelized
# Usage of for loop because, to prevent creation of tmp arrays due to broadcasting
# It takes time to allocate tmp array
# No loss in writing for loop as numba will vectorize this just like numpy
for i in nb.prange(A.shape[0]):
min_num = 1e+307
min_index = -1
for j in range(B.shape[0]):
t = np.abs(A[i] - B[j])
if t < min_num:
min_index = j
min_num = t
arg_mins[i] = min_index
return arg_mins
less_mem_ver(A, B)
Upvotes: 2
Reputation: 260640
You can compute the absolute difference between B and reshaped A, and use argmin
on axis=1
:
np.argmin(np.abs(B-A[:,None]), axis=1)
output: array([0, 2, 0, 4])
Upvotes: 3