Reputation: 10203
Consider two arrays of different length:
A = np.array([58, 22, 86, 37, 64])
B = np.array([105, 212, 5, 311, 253, 419, 123, 461, 256, 464])
For each value in A
, I want to find the smallest absolute difference between values in A
and B
. I use Pandas
because my actual arrays are subsets of Pandas dataframes but also because the apply
method is a convenient (albeit slow) approach to taking the difference between two different-sized arrays:
In [22]: pd.Series(A).apply(lambda x: np.min(np.abs(x-B)))
Out[22]:
0 47
1 17
2 19
3 32
4 41
dtype: int64
BUT I also want to keep the sign, so the desired output is:
0 -47
1 17
2 -19
3 32
4 -41
dtype: int64
[update] my actual arrays A
and B
are approximately of 5e4 and 1e6 in length so a low memory solution would be ideal. Also, I wish to avoid using Pandas because it is very slow on the actual arrays.
Upvotes: 3
Views: 2279
Reputation: 294258
I couldn't help myself. This is not what you should do! But, it is cute.
[min(x - B, key=abs) for x in A]
[-47, 17, -19, 32, -41]
If N = len(A)
and M = len(B)
then this solution should be O(N + M log(M))
If B
is already sorted, then the sorting step is unnecessary. and this becomes O(N + M)
C = np.sort(B)
a = C.searchsorted(A)
# It is possible that `i` has a value equal to the length of `C`
# in which case the searched value exceeds all those found in `C`.
# In that case, we want to clip the index value to the right most index
# which is `len(C) - 1`
right = np.minimum(a, len(C) - 1)
# For those searched values that are less than the first value in `C`
# the index value will be `0`. When I subtract `1`, we'll end up with
# `-1` which makes no sense. So we clip it to `0`.
left = np.maximum(a - 1, 0)
For clipped values, we'll end up comparing a value to itself and therefore it is safe.
right_diff = A - C[right]
left_diff = A - C[left ]
np.where(np.abs(right_diff) <= left_diff, right_diff, left_diff)
array([-47, 17, -19, 32, -41])
Upvotes: 3
Reputation: 402463
Let's use broadcasted subtraction here. We then use argmin
to find the absolute minimum, then extract the values in a subsequent step.
u = A[:,None] - B
idx = np.abs(u).argmin(axis=1)
u[np.arange(len(u)), idx]
# array([-47, 17, -19, 32, -41])
This uses pure NumPy broadcasting, so it should be quite fast.
Upvotes: 5
Reputation: 150735
Since you tagged pandas
:
# compute the diff by broadcasting
diff = pd.DataFrame(A[None,:] - B[:,None])
# mininum value
min_val = diff.abs().min()
# mask with where and stack to drop na
diff.where(diff.abs().eq(min_val)).stack()
Output:
0 0 -47.0
2 -19.0
4 -41.0
2 1 17.0
3 32.0
dtype: float64
Upvotes: 2
Reputation: 1533
np.argmin
can find the position of the minimum value. Therefore you can simply do this:
pd.Series(A).apply(lambda x: x-B[np.argmin(np.abs(x-B))])
Upvotes: 2