Reputation: 520
Given a 1 dimensional array of Values:
A = [x,..,x,0,..,0,x,..,x,0,..,0,x,..,x,........]
where:
x,..,x stands for for an arbitrary number of arbitrary Values
and
0,..,0 stands for an arbitrary number of Zeros
I need to find a fast algorithm to find the indices of the borders i.e.: ..,x,0,.. and ..,0,x..
This problem seems to lend itself to parallelization but that is beyond my experience simple looping over the array is to slow as the data is to big
THX Martin
Upvotes: 4
Views: 1656
Reputation: 284582
@chthonicdaemon's answer gets you 90% of the way there, but if you actually want to use the indices to chop up your array, you need some additional information.
Presumably, you want to use the indicies to extract the regions of the array that aren't 0. You've found the indices where the array changes, but you don't know if the change was from True
to False
or the opposite way around. Therefore, you need to check the first and last values and adjust accordingly. Otherwise, you'll wind up extracting the segment of zeros instead of data in some cases.
For example:
import numpy as np
def contiguous_regions(condition):
"""Finds contiguous True regions of the 1D boolean array "condition".
Returns a 2D array where the first column is the start index of the region
and the second column is the end index."""
# Find the indicies of changes in "condition"
idx = np.flatnonzero(np.diff(condition)) + 1
# Prepend or append the start or end indicies to "idx"
# if there's a block of "True"'s at the start or end...
if condition[0]:
idx = np.append(0, idx)
if condition[-1]:
idx = np.append(idx, len(condition))
return idx.reshape(-1, 2)
# Generate an example dataset...
t = np.linspace(0, 4*np.pi, 20)
x = np.abs(np.sin(t)) + 0.1
x[np.sin(t) < 0.5] = 0
print x
# Get the contiguous regions where x is not 0
for start, stop in contiguous_regions(x != 0):
print x[start:stop]
So in this case, our example dataset looks like:
array([ 0. , 0.71421271, 1.06940027, 1.01577333, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.93716648, 1.09658449, 0.83572391, 0. ,
0. , 0. , 0. , 0. , 0. ])
And by doing:
for start, stop in contiguous_regions(x != 0):
print x[start:stop]
We'll get:
[ 0.71421271 1.06940027 1.01577333]
[ 0.93716648 1.09658449 0.83572391]
Upvotes: 2
Reputation: 19750
This should at least push the looping down into the Numpy primitives, although it will traverse the array three times:
A = 2*(rand(200000)>0.2) # testing data
borders = flatnonzero(diff(A==0))
This takes 1.79 ms on my computer.
Upvotes: 0