Olga
Olga

Reputation: 1455

Return the indexes of a sub-array in an array

I use Python with numpy.

I have a numpy array may_a:

may_a = numpy.array([False, True, False, True, True, False, True, False, True, True, False])

I have a numpy array may_b:

may_b = numpy.array([False,True,True,False])

I need to find array may_b in array may_a.

In the output I need to get indexes of occurrences.

out_index=[2,7]

Can someone please suggest, how do I get out_index?

Upvotes: 8

Views: 5844

Answers (5)

Bi Rico
Bi Rico

Reputation: 25833

This looks very similar to a string search problem. If you want to avoid implementing one these string search algorithms, you could abuse pythons built in string search, which is very fast, by doing something like:

# I've added [True, True, True] at the end.
may_a = numpy.array([False, True, False, True, True, False, True, False, True, True, False, True, True, True])
may_b = numpy.array([False,True,True,False])

may_a_str = may_a.tostring()
may_b_str = may_b.tostring()

idx = may_a_str.find(may_b_str)
out_index = []
while idx >= 0:
    out_index.append(idx)
    idx = may_a_str.find(may_b_str, idx+1)

This should work fine for boolean arrays. If you want to use this approach for another array type, you'll need to make sure the strides of the two arrays match and divide out_index by that stride.

You could also use the regular expression module instead of the loop to do the string search.

Upvotes: 3

Jaime
Jaime

Reputation: 67507

EDIT The following code does allow to perform a convolution based check of equality. It maps True to 1 and False to -1. It also reverses b, which is needed for it to work properly:

def search(a, b) :
    return np.where(np.round(fftconvolve(a * 2 - 1, (b * 2 - 1)[::-1],
                                         mode='valid') - len(b)) == 0)[0]

I have checked that it gives the same output as the as_strided method for a variety of random inputs, which it does. I have also timed both approached, and convolution only starts paying off with largish search tokens of around 256 items.


It seems like a little overkill, but with boolean data you can use (abuse?) convolution:

In [8]: np.where(np.convolve(may_a, may_b.astype(int),
   ...:                      mode='valid') == may_b.sum())[0]
Out[8]: array([2, 7])

For larger datasets it may be faster to go with scipy.signal.fftconvolve:

In [13]: np.where(scipy.signal.fftconvolve(may_a, may_b,
   ....:                                   mode='valid') == may_b.sum())[0]
Out[13]: array([2, 7])

You have to be careful though, because the output now is floating point, and rounding may spoil the equality check:

In [14]: scipy.signal.fftconvolve(may_a, may_b, mode='valid')
Out[14]: array([ 1.,  1.,  2.,  1.,  1.,  1.,  1.,  2.])

So you may be better off with something along the lines of:

In [15]: np.where(np.round(scipy.signal.fftconvolve(may_a, may_b, mode='valid') -
   ....:                   may_b.sum()) == 0)[0]
Out[15]: array([2, 7])

Upvotes: 5

root
root

Reputation: 80456

This should also work with other that boolean data:

In [1]: import numpy as np

In [2]: a = np.array([False, True, False, True, True, False, True, False, True, True, False])

In [3]: b = np.array([False,True,True,False])

In [4]: def get_indices(a, b):
   ...:     window = len(b)
   ...:     shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
   ...:     strides = a.strides + (a.strides[-1],)
   ...:     w = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
   ...:     return np.where(np.all(np.equal(w,b),1) == True)[0]

In [5]: get_indices(a,b)
Out[5]: array([2, 7])

Upvotes: 2

Jaime
Jaime

Reputation: 67507

A much cooler approach, which may not perform a well, but which works for any dtype, is to use as_strided:

In [2]: from numpy.lib.stride_tricks import as_strided

In [3]: may_a = numpy.array([False, True, False, True, True, False,
   ...:                      True, False, True, True, False])

In [4]: may_b = numpy.array([False,True,True,False])

In [5]: a = len(may_a)

In [6]: b = len(may_b)

In [7]: a_view = as_strided(may_a, shape=(a - b + 1, b),
   ...:                     strides=(may_a.dtype.itemsize,) * 2)

In [8]: a_view
Out[8]: 
array([[False,  True, False,  True],
       [ True, False,  True,  True],
       [False,  True,  True, False],
       [ True,  True, False,  True],
       [ True, False,  True, False],
       [False,  True, False,  True],
       [ True, False,  True,  True],
       [False,  True,  True, False]], dtype=bool)

In [9]: numpy.where(numpy.all(a_view == may_b, axis=1))[0]
Out[9]: array([2, 7])

You have to be careful though, because even though a_view is a view of may_a's data, when comparing it with may_b a temporary array of (a - b + 1) * b is created, which may be a problem with large as and bs.

Upvotes: 5

Konfle Dolex
Konfle Dolex

Reputation: 443

I am not sure whether numpy provide a function for that. If it does not, here is a solution:

import numpy

def searchListIndexs(array, target):
    ret = []
    iLimit = len(array)-len(target)+1
    jLimit = len(target)
    for i in range(iLimit):
        for j in range(jLimit):
            if array[i+j] != target[j]:
                break
        else:
            ret.append(i)
    return ret


may_a = numpy.array([False, True, False, True, True, False, True, False, True, True, False])
may_b = numpy.array([False,True,True,False])
out_index = searchListIndexs(may_a, may_b)
print out_index #If you are using Python 3, then use print(out_index) instead.

Upvotes: 1

Related Questions