cndata
cndata

Reputation: 13

Find values close to multiples in NumPy array

I have a numpy array as follows:

array([    90.,    180.,    270.,    360.,    450.,    540.,    630.,
      720.,    810.,    900.,    990.,   1080.,   1170.,   1260.,
     1350.,   1440.,   1530.,   1620.,   1710.,   1800.,   1890.,
     1980.,   2070.,   2160.,   2250.,   2340.,   2430.,   2520.,
     2610.,   2700.,   2790.,   2880.,   2970.,   3060.,   3150.,
     3240.,   3330.,   3420.,   3510.,   3600.,   3690.,   3780.,
     3870.,   3960.,   4050.,   4140.,   4230.,   4320.,   4410.,
     4500.,   4590.,   4680.,   4770.,   4860.,   4950.,   5040.,
     5130.,   5220.,   5310.,   5400.,   5490.,   5580.,   5670.,
     5760.,   5850.,   5940.,   6030.,   6120.,   6210.,   6300.,
     6390.,   6480.,   6570.,   6660.,   6750.,   6840.,   6930.,
     7020.,   7110.,   7200.,   7290.,   7380.,   7470.,   7560.,
     7650.,   7740.,   7830.,   7920.,   8010.,   8100.,   8190.,
     8280.,   8370.,   8460.,   8550.,   8640.,   8730.,   8820.,
     8910.,   9000.,   9090.,   9180.,   9270.,   9360.,   9450.,
     9540.,   9630.,   9720.,   9810.,   9900.,   9990.,  10080.,
    10170.,  10260.,  10350.,  10440.,  10530.,  10620.,  10710.,
    10800.])

I am trying to match values that come the closest to 4300 or multiple of 4300. So the condition statement would return TRUE for the value of 4320 and 8640 in the array above and FALSE for all other values. How can I accomplish that?

TIA

Edit 1:

After looking at the answer, I need to clarify. I was looking for the closest value within +-20% of the multiple to return TRUE.

Upvotes: 0

Views: 155

Answers (2)

jakevdp
jakevdp

Reputation: 86433

You can do this using the modulo operator and the floor division operator:

>>> tol = 20
>>> x[x % 4300 <= abs(x // 4300) * tol]
array([ 4320.,  8640.])

Edit:

Given the updated question (finding the closest match within a particular tolerance), what you're looking at is a specialization of a nearest neighbors problem. I'd probably do this with scipy's cKDTree, to be most efficient. For example:

>>> from scipy.spatial import cKDTree

>>> tree = cKDTree(x[:, None])

>>> queries = 4300 * np.arange(np.floor((x / 4300).min()), np.ceil((x / 4300).max()) + 1)

>>> queries
array([     0.,   4300.,   8600.,  12900.])

>>> d, i = tree.query(queries[:, None], k=1)

>>> x[i]
array([    90.,   4320.,   8640.,  10800.])

>>> tol = 0.2

>>> x[i][d < tol * x[i]]
array([  4320.,   8640.,  10800.])

Upvotes: 3

fuglede
fuglede

Reputation: 18211

If I'm reading the question correctly, cf. my comment to @jakevdp's answer, only one value should be returned for each multiple. In that case, you can obtain the indices through broadcasting by doing something like

a = np.arange(0, np.max(x)+4300, 4300)[np.newaxis].T
b = np.zeros(x.shape, dtype=np.bool_)
b[np.argmin(np.abs(a - x), axis=1)] = 1

where x is your input array. In the given example, this produces

array([ True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,  True, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False,  True, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,  True], dtype=bool)

Edit: The updated question adds the requirement that the given result moreover be within a certain range of the multiple. We can handle that by adding an extra step to the above:

a = np.arange(0, np.max(x) + 4300, 4300)[np.newaxis].T
idx = np.argmin(np.abs(a - x), axis=1)
b = np.zeros(x.shape, dtype=np.bool_)
b[idx[np.abs(x[idx] - a.ravel()) < 4300*0.2]] = 1

In this case, b becomes

array([ True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,  True, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False,  True, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False], dtype=bool)

Edit #2 given the comments for @jakevdp's answer: To ignore the first multiple, just use

a = np.arange(4300, np.max(x)+4300, 4300)[np.newaxis].T

instead

Upvotes: 1

Related Questions