David Jones
David Jones

Reputation: 5184

Can I find out if one numpy vector appears as a slice of another?

I want to find out if my numpy vector, needle, appears inside another vector, haystack, as a slice, or contiguous sub-vector.

I want a function find(needle, haystack) that returns true if and only if there are possible integer indexes p and q such that needle equals haystack[p:q], where "equals" means elements are equal at all positions.

Example:

find([2,3,4], [1,2,3,4,5]) == True
find([2,4], [1,2,3,4,5]) == False  # not contiguous inside haystack
find([2,3,4], [0,1,2,3]) == False  # incomplete

Here I am using lists to simplify the illustration, but really they would be numpy vectors (1-dimensional arrays).

For strings in Python, the equivalent operation is trivial: it's in: "bcd" in "abcde" == True.


An appendix on dimensionality.

Dear reader, you might be tempted by similar looking questions, such as testing whether a Numpy array contains a given row, or Checking if a NumPy array contains another array. But we can dismiss this similarity as not being helpful by a consideration of dimensions.

A vector is a one-dimensional array. In numpy terms a vector of length N will have .shape == (N,); its shape has length 1.

The other referenced questions are, generally seeking to find an exact match for a row in a matrix that is 2-dimensional.

I am seeking to slide my 1-dimensional needle along the same axis of my 1-dimensional haystack like a window, until the entire needle matches the portion of the haystack that is visible through the window.

Upvotes: 1

Views: 907

Answers (2)

Georgina Skibinski
Georgina Skibinski

Reputation: 13397

Try with list comprehension:

def find(a,x):
    return any([x[i:i+len(a)]==a for i in range(1+len(x)-len(a))])

Outputs:

print(find([2,3,4], [1,2,3,4,5]),
find([2,4], [1,2,3,4,5]),
find([2,3,4], [0,1,2,3]), find([2,3,4], [0,1,2,3,4,5,2,3,4]))
>> True False False True

Upvotes: 0

hilberts_drinking_problem
hilberts_drinking_problem

Reputation: 11602

If you are fine with creating copies of the two arrays, you could fall back on Python in operator for byte objects:

def find(a, b):
  return a.tobytes() in b.tobytes()

print(
    find(np.array([2,3,4]), np.array([1,2,3,4,5])),
    find(np.array([2,4]),   np.array([1,2,3,4,5])),
    find(np.array([2,3,4]), np.array([0,1,2,3])),
    find(np.array([2,3,4]), np.array([0,1,2,3,4,5,2,3,4])),
)

# True False False True

Upvotes: 1

Related Questions