Reputation: 55
I would like to compare values from columns of two different numpy arrays A and B. More specifically, A contains values from a real experiment that I want to match with theoretical values that are given in the third column of B.
There are no perfect matches and therefore I have to use a tolerance, e.g. 0.01. For each value in A, I expect 0 to 20 matches in B with respect to the selected tolerance. As a result, I would like to get those lines in B that are within the tolerance to a value in A.
To be more specific, here an example:
A = array([[ 2.83151742e+02, a0],
[ 2.83155339e+02, a1],
[ 3.29241719e+02, a2],
[ 3.29246229e+02, a3]])
B = array([[ 0, 0, 3.29235519e+02, ...],
[ 0, 0, 3.29240819e+02, ...],
[ 0, 0, 3.29241919e+02, ...],
[ 0, 0, 3.29242819e+02, ...]])
So here all values of B would match A[3,0] and A[4,0] for a tolerance of 0.02.
My preferred result would like this with the matched value of A in C[:,0] and the difference between C[:,0] and C[:,2] in C[:,1]:
C = array([[ 3.29241719e+02, c0, 3.29235519e+02, ...],
[ 3.29241719e+02, c1, 3.29240819e+02, ...],
[ 3.29241719e+02, c2, 3.29241919e+02, ...],
[ 3.29241719e+02, c3, 3.29242819e+02, ...]
[ 3.29242819e+02, c4, 3.29235519e+02, ...],
[ 3.29242819e+02, c5, 3.29240819e+02, ...],
[ 3.29242819e+02, c6, 3.29241919e+02, ...],
[ 3.29242819e+02, c7, 3.29242819e+02, ...]])
Typically, A has shape (500, 2) and B has shape (300000, 11). I can solve it with for-loops, yet it takes ages.
What would be the most efficient way for this comparison?
Upvotes: 3
Views: 1120
Reputation: 14399
I'd imagine it would be something like
i = np.nonzero(np.isclose(A[:,:,None], B[:, 2]))[-1]
np.isclose
accepts a few different tolerance parameters.
The values in B
close to the A
values would then be B[i, 2]
Upvotes: 1