martburg
martburg

Reputation: 520

Numpy Detection of region borders

Given a 1 dimensional array of Values:

A = [x,..,x,0,..,0,x,..,x,0,..,0,x,..,x,........]

where:

x,..,x stands for for an arbitrary number of arbitrary Values

and

0,..,0 stands for an arbitrary number of Zeros

I need to find a fast algorithm to find the indices of the borders i.e.: ..,x,0,.. and ..,0,x..

This problem seems to lend itself to parallelization but that is beyond my experience simple looping over the array is to slow as the data is to big

THX Martin

Upvotes: 4

Views: 1656

Answers (2)

Joe Kington
Joe Kington

Reputation: 284582

@chthonicdaemon's answer gets you 90% of the way there, but if you actually want to use the indices to chop up your array, you need some additional information.

Presumably, you want to use the indicies to extract the regions of the array that aren't 0. You've found the indices where the array changes, but you don't know if the change was from True to False or the opposite way around. Therefore, you need to check the first and last values and adjust accordingly. Otherwise, you'll wind up extracting the segment of zeros instead of data in some cases.

For example:

import numpy as np

def contiguous_regions(condition):
    """Finds contiguous True regions of the 1D boolean array "condition".
    Returns a 2D array where the first column is the start index of the region
    and the second column is the end index."""
    # Find the indicies of changes in "condition"
    idx = np.flatnonzero(np.diff(condition)) + 1

    # Prepend or append the start or end indicies to "idx"
    # if there's a block of "True"'s at the start or end...
    if condition[0]:
        idx = np.append(0, idx)
    if condition[-1]:
        idx = np.append(idx, len(condition))

    return idx.reshape(-1, 2)

# Generate an example dataset...
t = np.linspace(0, 4*np.pi, 20)
x = np.abs(np.sin(t)) + 0.1
x[np.sin(t) < 0.5] = 0

print x

# Get the contiguous regions where x is not 0
for start, stop in contiguous_regions(x != 0):
    print x[start:stop]

So in this case, our example dataset looks like:

array([ 0.        ,  0.71421271,  1.06940027,  1.01577333,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.93716648,  1.09658449,  0.83572391,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ])

And by doing:

for start, stop in contiguous_regions(x != 0):
    print x[start:stop]

We'll get:

[ 0.71421271  1.06940027  1.01577333]
[ 0.93716648  1.09658449  0.83572391]

Upvotes: 2

chthonicdaemon
chthonicdaemon

Reputation: 19750

This should at least push the looping down into the Numpy primitives, although it will traverse the array three times:

A = 2*(rand(200000)>0.2)  # testing data
borders = flatnonzero(diff(A==0))

This takes 1.79 ms on my computer.

Upvotes: 0

Related Questions