johnnyD
johnnyD

Reputation: 59

Find all occurences of a specified match of two numbers in numpy array

what i need to achieve is to get array of all indexes, where in my data array filled with zeros and ones is step from zero to one. I need very quick solution, because i have to work with milions of arrays of hundrets milions length. It will be running in computing centre. For instance..

data_array = np.array([1,1,0,1,1,1,0,0,0,1,1,1,0,1,1,0])
result = [3,9,13]

Upvotes: 2

Views: 89

Answers (4)

Divakar
Divakar

Reputation: 221574

Since it's an array filled with 0s and 1s, you can benefit from just comparing rather than performing arithmetic operation between the one-shifted versions to directly give us the boolean array, which could be fed to np.flatnonzero to get us the indices and the final output.

Thus, we would have an implementation like so -

np.flatnonzero(data_array[1:] > data_array[:-1])+1

Runtime test -

In [26]: a = np.random.choice([0,1], 10**8)

In [27]: %timeit np.nonzero((a[1:] - a[:-1]) == 1)[0] + 1
1 loop, best of 3: 1.91 s per loop

In [28]: %timeit np.where(np.diff(a)==1)[0] + 1
1 loop, best of 3: 1.91 s per loop

In [29]: %timeit np.flatnonzero(a[1:] > a[:-1])+1
1 loop, best of 3: 954 ms per loop

Upvotes: 0

johnnyD
johnnyD

Reputation: 59

Well thanks a lot to all of you. Solution with nonzero is probably better for me, because I need to know steps from 0->1 and also 1->0 and finally calculate differences. So this is my solution. Any other advice appreciated .)

i_in  = np.nonzero(  (data_array[1:] - data_array[:-1]) ==  1   )[0] +1
i_out = np.nonzero(  (data_array[1:] - data_array[:-1]) == -1   )[0] +1

i_return_in_time = (i_in - i_out[:i_in.size] ) 

Upvotes: 0

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

try this:

In [23]: np.where(np.diff(a)==1)[0] + 1
Out[23]: array([ 3,  9, 13], dtype=int64)

Timing for 100M element array:

In [46]: a = np.random.choice([0,1], 10**8)

In [47]: %timeit np.nonzero((a[1:] - a[:-1]) == 1)[0] + 1
1 loop, best of 3: 1.46 s per loop

In [48]: %timeit np.where(np.diff(a)==1)[0] + 1
1 loop, best of 3: 1.64 s per loop

Upvotes: 3

Paul H
Paul H

Reputation: 68146

Here's the procedure:

  1. Compute the diff of the array
  2. Find the index where the diff == 1
  3. Add 1 to the results (b/c len(diff) = len(orig) - 1)

So try this:

index = numpy.nonzero((data_array[1:] - data_array[:-1]) == 1)[0] + 1
index
# [3, 9, 13]

Upvotes: 1

Related Questions