Reputation: 6869
Say that I have a NumPy array:
a = np.array([0, 1, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12, 13, 13, 13, 14, 15])
And I have a length m = 2
that the user specifies in order to see if there are any repeats of that length within the time series. In this case, the repeats of length m = 2
are:
[2, 2]
[5, 5]
[9, 9]
[9, 9]
[13, 13]
And the user can change this to m = 3
and the repeats of length m = 3
are:
[9, 9, 9]
[13, 13, 13]
I need a function that either returns the index of where a repeat is found or None
. So, for m = 3
the function would return the following NumPy array of starting indices:
[11, 17]
And for m = 4
the function would return None
. What's the cleanest and fastest way to accomplish this?
Update
Note that the array does not have to be sorted and we are not interested in the result after a sort. We only want the result from the unsorted array. Your result for m = 2
should be the same for this array:
b = np.array([0, 11, 2, 2, 3, 40, 5, 5, 16, 7, 80, 9, 9, 9, 1, 11, 12, 13, 13, 13, 4, 5])
Upvotes: 4
Views: 1063
Reputation: 25544
a numba
njit'ed function:
import numpy as np
import numba as nb
@nb.njit
def find_repeats(arr, n):
indices = []
current = arr[0]
count = 0
for idx, item in enumerate(arr[1:]):
if item == current:
count += 1
if count >= n-1:
indices.append(idx)
else:
count = 0
current = item
return indices if indices else None
b = np.array([0, 11, 2, 2, 3, 40, 5, 5, 16, 7, 80, 9, 9, 9, 1, 11, 12, 13, 13, 13, 4, 5])
print(find_repeats(b, 2))
# [2, 6, 11, 12, 17, 18]
print(find_repeats(b, 3))
# [12, 18]
Less elegant but fast, especially for smaller array sizes - simple_benchmark
comparison vs. Divakar's functions:
Upvotes: 0
Reputation: 221514
Approach #1
We could leverage 1D convolution
for a vectorized solution -
def consec_repeat_starts(a, n):
N = n-1
m = a[:-1]==a[1:]
return np.flatnonzero(np.convolve(m,np.ones(N, dtype=int))==N)-N+1
Sample runs -
In [286]: a
Out[286]:
array([ 0, 1, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12,
13, 13, 13, 14, 15])
In [287]: consec_repeat_starts(a, 2)
Out[287]: array([ 2, 6, 11, 12, 17, 18])
In [288]: consec_repeat_starts(a, 3)
Out[288]: array([11, 17])
In [289]: consec_repeat_starts(a, 4)
Out[289]: array([], dtype=int64)
Approach #2
We could also make use of binary-erosion
-
from scipy.ndimage.morphology import binary_erosion
def consec_repeat_starts_v2(a, n):
N = n-1
m = a[:-1]==a[1:]
return np.flatnonzero(binary_erosion(m,[1]*N))-(N//2)
Upvotes: 3
Reputation: 14399
We can also use a recursive function:
def recursive_repeat(arr, k):
if k == 1:
return np.flatnonzero(arr)
else:
new_arr = np.zeros(arr.size - 1)
mask = arr[1:] == arr[:-1]
new_arr[mask] = arr[:-1][mask]
return recursive_repeat(new_arr, k-1)
recursive_repeat(a, 2)
Out[]: array([ 2, 6, 11, 12, 17, 18], dtype=int64)
recursive_repeat(a, 3)
Out[]: array([11, 17], dtype=int64)
Upvotes: 0
Reputation: 194
If you just want to check for consecutive similar values in an unsorted array this method will still work:
import numpy as np
nums = np.array([11, 0, 1, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12, 13, 13, 13, 14, 15])
m = 3
equals = []
for ii in range(len(nums)):
if ii >= m:
sliced = nums[ii-m:ii]
if np.unique(sliced).size == 1:
equals.append(sliced)
print(np.array(equals))
This will give the output for m=2:
[[ 2 2]
[ 5 5]
[ 9 9]
[ 9 9]
[13 13]
[13 13]]
This will give the output for m=3:
[[ 9 9 9]
[13 13 13]]
Upvotes: 0
Reputation: 308
I found this one, quite simillar to the otherone posted, but it should work. It only work if the array is sorted
def find(a,m):
index=[]
prev=0
n =1
for i in range(1,len(a)):
if a[prev] == a[i]:
n+=1
if(m==n):
index.append(i-m+1)
n=1
else:
n=1
prev=i
return index if len(index)>0 else None
find(a,3)
Upvotes: 0
Reputation: 104
I've come up with this solution, maybe it's not the cleanest.
I use a regular array instead of a numpy one, but it should work the same way.
a = [0, 1, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12, 13, 13, 13, 14, 15]
m = 2
def index_repeated_length( list, length ) :
previous = None
count = 1
for i, element in enumerate(list) :
if element == previous :
count += 1
else :
previous = element
count = 1
if count >= length :
yield i - length + 1
print( list( index_repeated_length(a, m) ) )
Output:
[2, 6, 11, 12, 17, 18]
Since it's never sorted you have to iterate through it, so the complexity should be linear O(n)
Upvotes: 0