Count instances in numpy array within a certain value of each row

Question

I have a numpy array such as this

[[ 0, 57],
 [ 7, 72],
 [ 2, 51],
 [ 8, 67],
 [ 4, 42]]

I want to find out for each row, how many elements in the 2nd column are within a certain distance (say, 10) of the 2nd column value for that row. So in this example, here the solution would be

[[ 0, 57, 3],
 [ 7, 72, 2],
 [ 2, 51, 3],
 [ 8, 67, 3],
 [ 4, 42, 2]]

So [first row, third column] is 3, because there are 3 elements in the 2nd column (57,51,67) which are within distance 10 from 57. Similarly for each row

Any help would be appreciated!

Divakar · Accepted Answer

Here's one approach leveraging broadcasting with outer-subtraction -

(np.abs(a[:,1,None] - a[:,1]) <= 10).sum(1)

With outer subtract builtin and count_nonzero for counting -

np.count_nonzero(np.abs(np.subtract.outer(a[:,1],a[:,1]))<=10,axis=1)

Sample run -

# Input array
In [23]: a
Out[23]: 
array([[ 0, 57],
       [ 7, 72],
       [ 2, 51],
       [ 8, 67],
       [ 4, 42]])

# Get count
In [24]: count = (np.abs(a[:,1,None] - a[:,1]) <= 10).sum(1)

In [25]: count
Out[25]: array([3, 2, 3, 3, 2])

# Stack with input
In [26]: np.c_[a,count]
Out[26]: 
array([[ 0, 57,  3],
       [ 7, 72,  2],
       [ 2, 51,  3],
       [ 8, 67,  3],
       [ 4, 42,  2]])

Alternatively with SciPy's cdist -

In [53]: from scipy.spatial.distance import cdist

In [54]: (cdist(a[:,None,1],a[:,1,None], 'minkowski', p=2)<=10).sum(1)
Out[54]: array([3, 2, 3, 3, 2])

For million rows in the input, we might want to resort to a loopy one -

n = len(a)
count = np.empty(n, dtype=int)
for i in range(n):
    count[i] = np.count_nonzero(np.abs(a[:,1]-a[i,1])<=10)

Count instances in numpy array within a certain value of each row

Answers (2)

Related Questions