How to get similar elements of two numpy arrays with a tolerance

Question

I would like to compare values from columns of two different numpy arrays A and B. More specifically, A contains values from a real experiment that I want to match with theoretical values that are given in the third column of B.

There are no perfect matches and therefore I have to use a tolerance, e.g. 0.01. For each value in A, I expect 0 to 20 matches in B with respect to the selected tolerance. As a result, I would like to get those lines in B that are within the tolerance to a value in A.

To be more specific, here an example:

A = array([[  2.83151742e+02,   a0],
   [  2.83155339e+02,   a1],
   [  3.29241719e+02,   a2],
   [  3.29246229e+02,   a3]])


B = array([[ 0, 0,  3.29235519e+02, ...],
   [ 0, 0,  3.29240819e+02, ...],
   [ 0, 0,  3.29241919e+02, ...],
   [ 0, 0,  3.29242819e+02, ...]])

So here all values of B would match A[3,0] and A[4,0] for a tolerance of 0.02.

My preferred result would like this with the matched value of A in C[:,0] and the difference between C[:,0] and C[:,2] in C[:,1]:

C = array([[ 3.29241719e+02, c0,  3.29235519e+02, ...],
   [ 3.29241719e+02, c1,  3.29240819e+02, ...],
   [ 3.29241719e+02, c2,  3.29241919e+02, ...],
   [ 3.29241719e+02, c3,  3.29242819e+02, ...]
   [ 3.29242819e+02, c4,  3.29235519e+02, ...],
   [ 3.29242819e+02, c5,  3.29240819e+02, ...],
   [ 3.29242819e+02, c6,  3.29241919e+02, ...],
   [ 3.29242819e+02, c7,  3.29242819e+02, ...]])

Typically, A has shape (500, 2) and B has shape (300000, 11). I can solve it with for-loops, yet it takes ages.

What would be the most efficient way for this comparison?

Daniel F · Accepted Answer

I'd imagine it would be something like

i = np.nonzero(np.isclose(A[:,:,None], B[:, 2]))[-1]

np.isclose accepts a few different tolerance parameters.

The values in B close to the A values would then be B[i, 2]

How to get similar elements of two numpy arrays with a tolerance

Answers (1)

Related Questions