Ariel Yael
Ariel Yael

Reputation: 373

numpy get the blocks making an array in a matrix

I have a numpy matrix such as the one below. What I'd like to do is get the arrays containing the blocks making each row/column here. How can this be efficiently done in numpy?

Examples

So if for example we have the array [1 1 1 1 0 1 1 1 1 0] (first row) then we would get [4 4] since we have 2 blocks of 4.

For the first column we would get [3 1] since we have at the start three 1-s, followed by a zero, then one 1 and then more zeros.

The mentioned matrix

[[1 1 1 1 0 1 1 1 1 0]
 [1 0 0 1 0 1 1 1 1 1]
 [1 0 1 0 1 0 0 1 0 1]
 [0 1 0 0 1 0 1 0 0 0]
 [1 1 1 1 1 1 0 1 0 1]
 [0 0 1 0 0 1 1 1 0 0]
 [0 0 0 0 1 1 0 1 1 0]
 [0 0 0 0 0 0 0 0 1 1]
 [0 1 0 1 0 1 0 0 0 0]
 [0 0 1 0 0 0 1 1 1 0]]

NOTE: rows are ordered from left to right, and columns are top to bottom.

Upvotes: 4

Views: 465

Answers (2)

mathfux
mathfux

Reputation: 5949

This is a way to do this row-wise:

def get_blocks(a):
    x = np.diff(a,prepend=0,append=0)
    groups, starts = np.nonzero(x*x+x)
    groups, ends = np.nonzero(x*x-x)
    values = ends - starts
    markers = np.diff(groups, prepend=0)
    marker_idx = np.nonzero(marker_idx)[0]
    return np.split(values, marker_idx)

Sample run:

>>> a = np.array([[1, 1, 1, 1, 0, 1, 1, 1, 1, 0],
   [1, 0, 0, 1, 0, 1, 1, 1, 1, 1],
   [1, 0, 1, 0, 1, 0, 0, 1, 0, 1],
   [0, 1, 0, 0, 1, 0, 1, 0, 0, 0],
   [1, 1, 1, 1, 1, 1, 0, 1, 0, 1],
   [0, 0, 1, 0, 0, 1, 1, 1, 0, 0],
   [0, 0, 0, 0, 1, 1, 0, 1, 1, 0],
   [0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
   [0, 1, 0, 1, 0, 1, 0, 0, 0, 0],
   [0, 0, 1, 0, 0, 0, 1, 1, 1, 0]])
>>> get_blocks(a)
[array([4, 4], dtype=int32), 
array([1, 1, 5], dtype=int32), 
array([1, 1, 1, 1, 1], dtype=int32), 
array([1, 1, 1], dtype=int32), 
array([6, 1, 1], dtype=int32), 
array([1, 3], dtype=int32), 
array([2, 2], dtype=int32), 
array([2], dtype=int32), 
array([1, 1, 1], dtype=int32), 
array([1, 3], dtype=int32)]

It works for columns as well if you call it on A.T

Upvotes: 1

tenhjo
tenhjo

Reputation: 4537

Here is some numpy magic:

a = np.array([[1, 1, 1, 1, 0, 1, 1, 1, 1, 0],
              [1, 0, 0, 1, 0, 1, 1, 1, 1, 1],
              [1, 0, 1, 0, 1, 0, 0, 1, 0, 1],
              [0, 1, 0, 0, 1, 0, 1, 0, 0, 0],
              [1, 1, 1, 1, 1, 1, 0, 1, 0, 1],
              [0, 0, 1, 0, 0, 1, 1, 1, 0, 0],
              [0, 0, 0, 0, 1, 1, 0, 1, 1, 0],
              [0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
              [0, 1, 0, 1, 0, 1, 0, 0, 0, 0],
              [0, 0, 1, 0, 0, 0, 1, 1, 1, 0]])


a_pad = np.zeros((a.shape[0]+2, a.shape[1]+2))
a_pad[1:-1, 1:-1] = a

cols = [np.diff(np.nonzero(c)[0].reshape(-1, 2), axis=1)[:, 0]
        for c in np.diff(a_pad, axis=0).T[1:-1]]
# [array([3, 1]),  array([1, 2, 1]),  array([1, 1, 2, 1]), ...

rows = [np.diff(np.nonzero(r)[0].reshape(-1, 2), axis=1)[:, 0]
        for r in np.diff(a_pad, axis=1)[1:-1]]
# [array([4, 4]),  array([1, 1, 5]),  array([1, 1, 1, 1, 1]), ...

Now let's explore what happens on an example array (a[5, :]):

# a            [0, 0, 1,  0, 0, 0, 1, 1,  1, 0,]
# pad       [0, 0, 0, 1,  0, 0, 0, 1, 1,  1, 0, 0]
# diff()       [0, 0, 1, -1, 0, 0, 1, 0, 0, -1, 0]
#                     ^   ^        ^         ^
# nonzero()          [2,  3,       6,        9]
# reshape() [[2, 3],
#            [6, 9]]
# diff()     [1, 3]

The idea is that when padding the binary array with zeros at both ends one can find the start and end of each sequence of ones easily by applying np.diff() (1 where 0->1 and -1 where 1->0). Therefore np.nonzero(np.diff()) gives the indices of the start and end points of each sequence. Additionally we know that starts (+1) and ends (-1) must always alternate. So np.reshape(-1, 2) gives us the start points in the first column and the end points in the second. Applying np.diff() again on this array gives the length of each sequence.

Upvotes: 2

Related Questions