slaw
slaw

Reputation: 6869

Find Consecutive Repeats of Specific Length in NumPy

Say that I have a NumPy array:

a = np.array([0, 1, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12, 13, 13, 13, 14, 15])

And I have a length m = 2 that the user specifies in order to see if there are any repeats of that length within the time series. In this case, the repeats of length m = 2 are:

[2, 2]
[5, 5]
[9, 9]
[9, 9]
[13, 13]

And the user can change this to m = 3 and the repeats of length m = 3 are:

[9, 9, 9]
[13, 13, 13]

I need a function that either returns the index of where a repeat is found or None. So, for m = 3 the function would return the following NumPy array of starting indices:

[11, 17]

And for m = 4 the function would return None. What's the cleanest and fastest way to accomplish this?

Update Note that the array does not have to be sorted and we are not interested in the result after a sort. We only want the result from the unsorted array. Your result for m = 2 should be the same for this array:

b = np.array([0, 11, 2, 2, 3, 40, 5, 5, 16, 7, 80, 9, 9, 9, 1, 11, 12, 13, 13, 13, 4, 5])

Upvotes: 4

Views: 1063

Answers (6)

FObersteiner
FObersteiner

Reputation: 25544

a numba njit'ed function:

import numpy as np
import numba as nb

@nb.njit
def find_repeats(arr, n):
    indices = []
    current = arr[0]
    count = 0
    for idx, item in enumerate(arr[1:]):
        if item == current:
            count += 1
            if count >= n-1:
                indices.append(idx)
        else:
            count = 0
            current = item
    return indices if indices else None

b = np.array([0, 11, 2, 2, 3, 40, 5, 5, 16, 7, 80, 9, 9, 9, 1, 11, 12, 13, 13, 13, 4, 5])

print(find_repeats(b, 2))
# [2, 6, 11, 12, 17, 18]

print(find_repeats(b, 3))
# [12, 18]

Less elegant but fast, especially for smaller array sizes - simple_benchmark comparison vs. Divakar's functions: benchmark

Upvotes: 0

Divakar
Divakar

Reputation: 221514

Approach #1

We could leverage 1D convolution for a vectorized solution -

def consec_repeat_starts(a, n):
    N = n-1
    m = a[:-1]==a[1:]
    return np.flatnonzero(np.convolve(m,np.ones(N, dtype=int))==N)-N+1

Sample runs -

In [286]: a
Out[286]: 
array([ 0,  1,  2,  2,  3,  4,  5,  5,  6,  7,  8,  9,  9,  9, 10, 11, 12,
       13, 13, 13, 14, 15])

In [287]: consec_repeat_starts(a, 2)
Out[287]: array([ 2,  6, 11, 12, 17, 18])

In [288]: consec_repeat_starts(a, 3)
Out[288]: array([11, 17])

In [289]: consec_repeat_starts(a, 4)
Out[289]: array([], dtype=int64)

Approach #2

We could also make use of binary-erosion -

from scipy.ndimage.morphology import binary_erosion

def consec_repeat_starts_v2(a, n):
    N = n-1
    m = a[:-1]==a[1:]
    return np.flatnonzero(binary_erosion(m,[1]*N))-(N//2)

Upvotes: 3

Daniel F
Daniel F

Reputation: 14399

We can also use a recursive function:

def recursive_repeat(arr, k):
    if k == 1:
        return np.flatnonzero(arr)
    else:
        new_arr = np.zeros(arr.size - 1)
        mask = arr[1:] == arr[:-1]
        new_arr[mask] = arr[:-1][mask]
        return recursive_repeat(new_arr, k-1)


recursive_repeat(a, 2)
Out[]: array([ 2,  6, 11, 12, 17, 18], dtype=int64)

recursive_repeat(a, 3)
Out[]: array([11, 17], dtype=int64)

Upvotes: 0

Michael Richardson
Michael Richardson

Reputation: 194

If you just want to check for consecutive similar values in an unsorted array this method will still work:

import numpy as np

nums = np.array([11, 0, 1, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12, 13, 13, 13, 14, 15])

m = 3

equals = []

for ii in range(len(nums)):
    if ii >= m:
        sliced = nums[ii-m:ii]
        if np.unique(sliced).size == 1:
             equals.append(sliced)

print(np.array(equals))

This will give the output for m=2:

[[ 2  2]
[ 5  5]
[ 9  9]
[ 9  9]
[13 13]
[13 13]]

This will give the output for m=3:

[[ 9  9  9]
[13 13 13]]

Upvotes: 0

Jose Macedo
Jose Macedo

Reputation: 308

I found this one, quite simillar to the otherone posted, but it should work. It only work if the array is sorted

def find(a,m):
    index=[]
    prev=0
    n =1
    for i in range(1,len(a)):
        if a[prev] == a[i]:
            n+=1
            if(m==n):
                index.append(i-m+1)
                n=1
        else:
            n=1
        prev=i
    return index if len(index)>0 else None
find(a,3)

Upvotes: 0

Joan Puigcerver
Joan Puigcerver

Reputation: 104

I've come up with this solution, maybe it's not the cleanest.

I use a regular array instead of a numpy one, but it should work the same way.

a = [0, 1, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12, 13, 13, 13, 14, 15]
m = 2


def index_repeated_length( list, length ) :
    previous = None
    count = 1
    for i, element in enumerate(list) :
        if element == previous :
            count += 1
        else :
            previous = element
            count = 1

        if count >= length :
            yield i - length + 1

print( list( index_repeated_length(a, m) ) )

Output:

[2, 6, 11, 12, 17, 18]

Since it's never sorted you have to iterate through it, so the complexity should be linear O(n)

Upvotes: 0

Related Questions