numpy get the blocks making an array in a matrix

Question

I have a numpy matrix such as the one below. What I'd like to do is get the arrays containing the blocks making each row/column here. How can this be efficiently done in numpy?

Examples

So if for example we have the array [1 1 1 1 0 1 1 1 1 0] (first row) then we would get [4 4] since we have 2 blocks of 4.

For the first column we would get [3 1] since we have at the start three 1-s, followed by a zero, then one 1 and then more zeros.

The mentioned matrix

[[1 1 1 1 0 1 1 1 1 0]
 [1 0 0 1 0 1 1 1 1 1]
 [1 0 1 0 1 0 0 1 0 1]
 [0 1 0 0 1 0 1 0 0 0]
 [1 1 1 1 1 1 0 1 0 1]
 [0 0 1 0 0 1 1 1 0 0]
 [0 0 0 0 1 1 0 1 1 0]
 [0 0 0 0 0 0 0 0 1 1]
 [0 1 0 1 0 1 0 0 0 0]
 [0 0 1 0 0 0 1 1 1 0]]

NOTE: rows are ordered from left to right, and columns are top to bottom.

tenhjo · Accepted Answer

Here is some numpy magic:

a = np.array([[1, 1, 1, 1, 0, 1, 1, 1, 1, 0],
              [1, 0, 0, 1, 0, 1, 1, 1, 1, 1],
              [1, 0, 1, 0, 1, 0, 0, 1, 0, 1],
              [0, 1, 0, 0, 1, 0, 1, 0, 0, 0],
              [1, 1, 1, 1, 1, 1, 0, 1, 0, 1],
              [0, 0, 1, 0, 0, 1, 1, 1, 0, 0],
              [0, 0, 0, 0, 1, 1, 0, 1, 1, 0],
              [0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
              [0, 1, 0, 1, 0, 1, 0, 0, 0, 0],
              [0, 0, 1, 0, 0, 0, 1, 1, 1, 0]])


a_pad = np.zeros((a.shape[0]+2, a.shape[1]+2))
a_pad[1:-1, 1:-1] = a

cols = [np.diff(np.nonzero(c)[0].reshape(-1, 2), axis=1)[:, 0]
        for c in np.diff(a_pad, axis=0).T[1:-1]]
# [array([3, 1]),  array([1, 2, 1]),  array([1, 1, 2, 1]), ...

rows = [np.diff(np.nonzero(r)[0].reshape(-1, 2), axis=1)[:, 0]
        for r in np.diff(a_pad, axis=1)[1:-1]]
# [array([4, 4]),  array([1, 1, 5]),  array([1, 1, 1, 1, 1]), ...

Now let's explore what happens on an example array (a[5, :]):

# a            [0, 0, 1,  0, 0, 0, 1, 1,  1, 0,]
# pad       [0, 0, 0, 1,  0, 0, 0, 1, 1,  1, 0, 0]
# diff()       [0, 0, 1, -1, 0, 0, 1, 0, 0, -1, 0]
#                     ^   ^        ^         ^
# nonzero()          [2,  3,       6,        9]
# reshape() [[2, 3],
#            [6, 9]]
# diff()     [1, 3]

The idea is that when padding the binary array with zeros at both ends one can find the start and end of each sequence of ones easily by applying np.diff() (1 where 0->1 and -1 where 1->0). Therefore np.nonzero(np.diff()) gives the indices of the start and end points of each sequence. Additionally we know that starts (+1) and ends (-1) must always alternate. So np.reshape(-1, 2) gives us the start points in the first column and the end points in the second. Applying np.diff() again on this array gives the length of each sequence.

numpy get the blocks making an array in a matrix

Examples

Answers (2)

Related Questions