identifying sub-arrays in numpy

Question

I have two two dimensional arrays a and b (#columns of a <= #columns in b). I would like to find an efficient way of matching a row in array a to a contiguous part of a row in array b.

a = np.array([[ 25,  28],
              [ 84,  97],
              [105,  24],
              [ 28, 900]])

b = np.array([[ 25,  28,  84,  97],
              [ 22,  25,  28, 900],
              [ 11,  12, 105,  24]])

The output should be np.array([[0,0], [0,1], [1,0], [2,2], [3,1]]). Row 0 in array a matches Row 0 in array b (first two positions). Row 1 in array a matches row 0 in array b (third and fourth positions).

Divakar · Accepted Answer

We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows for efficient patch extraction, and then compare those patches against each row off a, all of it in a vectorized manner. Then, get the matching indices with np.argwhere -

# a and b from posted question
In [325]: from skimage.util.shape import view_as_windows

In [428]: w = view_as_windows(b,(1,a.shape[1]))

In [429]: np.argwhere((w == a).all(-1).any(-2))[:,::-1]
Out[429]: 
array([[0, 0],
       [1, 0],
       [0, 1],
       [3, 1],
       [2, 2]])

Alternatively, we could get the indices by the order of rows in a by pushing forward the first axis of a while performing broadcasted comparisons -

In [444]: np.argwhere((w[:,:,0] == a[:,None,None,:]).all(-1).any(-1))
Out[444]: 
array([[0, 0],
       [0, 1],
       [1, 0],
       [2, 2],
       [3, 1]])

identifying sub-arrays in numpy

Answers (2)

Related Questions