Reputation: 1381
I have a numpy binary array like this:
Array A = [1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0]
I would like to count how many 0s are there at the left of each 1, and return it in an other array that would look like this for this for this example:
nb_1s = [0, 0, 1, 2, 2, 5]
There are no 0s at the left for the two first 1s so the the first two numbers of the array are 0 etc...
I know that first I have to initiate an array with number of 1s in my array:
def give_zeros(binary_array):
binary_array = np.asarray(binary_array)
nb_zeros = np.zeros(binary_array.sum())
return nb_zeros
But I'm not sure on how to count the number of zeros. Should I iterate in a for loop with 'nditer'? It doesn't seem efficient as i will have to run this function on very large arrays.
Do you have any ideas? Thank you.
Upvotes: 4
Views: 1665
Reputation: 221564
Here's a vectorized way with differentiation of range array from the indices of 1s
-
def leftzeros_count(a):
idx = np.flatnonzero(a!=0)
return idx - np.arange(len(idx))
Sample runs -
In [298]: a = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])
In [299]: leftzeros_count(a)
Out[299]: array([0, 0, 1, 2, 2, 5])
In [300]: a = np.array([0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])
In [301]: leftzeros_count(a)
Out[301]: array([1, 1, 2, 3, 3, 6])
In [302]: a = np.array([0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1])
In [303]: leftzeros_count(a)
Out[303]: array([ 1, 1, 2, 3, 3, 6, 10])
Runtime test
For the timings, let's tile the given sample a large number of times and time the vectorized approaches -
In [7]: a = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])
In [8]: a = np.tile(a,100000)
# @Eric Duminil's soln
In [9]: %timeit (a == 0).cumsum()[a > 0]
100 loops, best of 3: 10.9 ms per loop
# Proposed in this post
In [10]: %timeit leftzeros_count(a)
100 loops, best of 3: 3.71 ms per loop
Upvotes: 3
Reputation: 101
If the count is cumulative (as per your example) then you can do this easily in O(n). Simply have a counter that increases by one every time you find a zero and then append the value of the counter variable to another array for every one you hit in your initial array.
Upvotes: 1
Reputation: 54223
You could use:
(A == 0).cumsum()[A > 0]
# array([0, 0, 1, 2, 2, 5])
or:
(~A).cumsum()[A]
# array([0, 0, 1, 2, 2, 5])
if A
is a bool array.
A == 0
is a boolean array which is True
for each 0
:
>>> import numpy as np
>>> A = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])
>>> A == 0
array([False, False, True, False, True, False, False, True, True,
True, False, True, True, True, True], dtype=bool)
You can use cumsum()
to count the number of True
s:
>>> (A == 0).cumsum()
array([0, 0, 1, 1, 2, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9])
You only need the values where A > 0
:
>>> (A == 0).cumsum()[A > 0]
array([0, 0, 1, 2, 2, 5])
Done!
Upvotes: 4
Reputation: 122052
In the non-vectorized manner:
>>> x = [1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0]
>>> c, y = 0, []
>>> for i in x:
... if i == 1:
... y.append(c)
... else:
... c += 1
...
>>> y
[0, 0, 1, 2, 2, 5]
For vectorized solution, see @Divakar's answer:
In numpy
, first find the non-zero indices, with np.nonzero()
:
>>> np.nonzero(x)[0]
array([ 0, 1, 3, 5, 6, 10])
Then subtract that with the range array of length of indices:
>>> idx = np.nonzero(x)[0]
>>> np.arange(len(idx))
array([0, 1, 2, 3, 4, 5])
>>> np.nonzero(x)[0] - np.arange(len(idx))
array([0, 0, 1, 2, 2, 5])
>>> np.arange(x.count(1))
array([0, 1, 2, 3, 4, 5])
>>> np.nonzero(x)[0] - np.arange(x.count(1))
array([0, 0, 1, 2, 2, 5])
Upvotes: 2