IHav
IHav

Reputation: 117

Finding biggest negative submatrix in python

I want find the biggest submatrix which contains only negative numbers in a matrix, for example:

In

[[1,  -9, -2,   8,  6,  1],    
 [8,  -1,-11,  -7,  6,  4],    
 [10, 12, -1,  -9, -12, 14],    
 [8, 10, -3,  -5,  17,  8],    
 [6,  4, 10, -13, -16, 19]]

the biggest submatrix containing only negative numbers is

[[-11, -7],
 [-1, -9],
 [-3,-5]]

(left upper corner coordinates: 1,2, right lower corner coordinates: 3,3).

What's the most effective way to do it?

Upvotes: 2

Views: 371

Answers (3)

Hunaphu
Hunaphu

Reputation: 701

Below is a function that executes in much less than a second for a 5000x5000 matrix. It relies only on basic np-functions.

It could be improved by returning the first index instead of all indices. Several other optimizations can be done but for many uses it is fast enough.

from numpy import roll, where
def gidx(X):
    Wl = X & roll(X, 1, axis=1)
    T = X & Wl & roll(X, -1, axis=1)
    if T[1:-1][1:-1].any():
        N = T & roll(T, -1, axis=0) & roll(T, 1, axis=0)
        if N.any(): return gidx(N)
    W = Wl & roll(Wl, 1, axis=0)
    if W.any(): return where(W)
    return where(X)

#%% Example
import numpy as np
#np.random.seed(0)
M = 100
X = np.random.randn(M, M) - 2

X0 = (X < 0)
X0[[0, -1]], X0[:, [0, -1]] = False, False
jx, kx = gidx(X0)

Upvotes: 0

Andrey Dugin
Andrey Dugin

Reputation: 31

Here is my pretty fast solution using convolutions from OpenCV. Usage of float32 is required as it is much faster than integers. Takes 135 ms for 1000 x 1000 matrix on my 2 cores. There's space for further code optimization though.

import cv2
import numpy as np

data = """1 -9 -2 8 6 1
8 -1 -11 -7 6 4
10 12 -1 -9 -12 14
8 10 -3 -5 17 8
6 4 10 -13 -16 19"""

# matrix = np.random.randint(-128, 128, (1000, 1000), dtype=np.int32)
matrix = np.int32([line.split() for line in data.splitlines()])

def find_max_kernel(matrix, border=cv2.BORDER_ISOLATED):
    max_area = 0
    mask = np.float32(matrix < 0)
    ones = np.ones_like(mask)
    conv_x = np.zeros_like(mask)
    conv_y = np.zeros_like(mask)
    max_h, max_w = mask.shape
    for h in range(1, max_h + 1):
        cv2.filter2D(mask, -1, ones[:h, None, 0], conv_y, (0, 0), 0, border)
        for w in range(1, max_w + 1):
            area = h * w
            if area > max_area:
                cv2.filter2D(conv_y, -1, ones[None, 0, :w], conv_x, (0, 0), 0, border)
                if conv_x.max() == area:
                    max_area, shape = area, (h, w)
                else:
                    if w == 1:
                        max_h = h - 1
                    if h == 1:
                        max_w = w - 1
                    break
        if h >= max_h:
            break
    cv2.filter2D(mask, -1, np.ones(shape, np.float32), conv_x, (0, 0), 0, border)
    p1 = np.array(np.unravel_index(conv_x.argmax(), conv_x.shape))
    p2 = p1 + shape - 1            
    return p1, p2

print(*find_max_kernel(matrix), sep='\n')

Upvotes: 2

quant
quant

Reputation: 2224

A brute force solution. Will work, but may be considered too slow for a bigger matrix:

mOrig = [[1,  -9, -2,   8,  6,  1],
    [8,  -1,-11,  -7,  6,  4],
    [10, 12, -1,  -9, -12, 14],
    [8, 10, -3,  -5,  17,  8],
    [6,  4, 10, -13, -16, 19]]

# reduce the problem
# now we have a matrix that contains only 0 and 1
# at the place where there was a negative number
# there is now a 1 and at the places where a positive
# number had been there is now a 0. 0s are considered
# to be negative numbers, if you want to change this,
# change the x < 0 to x <= 0.
m = [[1 if x < 0 else 0 for x in z] for z in mOrig]

# now we have the problem to find the biggest submatrix
# consisting only 1s.

# first a function that checks if a submatrix only contains 1s
def containsOnly1s(m, x1, y1, x2, y2):
    for i in range(x1, x2):
        for j in range(y1, y2):
            if m[i][j] == 0:
                return False
    return True

def calculateSize(x1, y1, x2, y2):
    return (x2 - x1) * (y2 - y1)

best = (-1, -1, -1, -1, -1)
for x1 in range(len(m)):
    for y1 in range(len(m[0])):
        for x2 in range(x1, len(m)):
            for y2 in range(y1, len(m[0])):
                if containsOnly1s(m, x1, y1, x2, y2):
                    sizeOfSolution = calculateSize(x1, y1, x2, y2)
                    if best[4] < sizeOfSolution:
                        best = (x1, y1, x2, y2, sizeOfSolution)

for x in range(best[0], best[2]):
    print("\t".join([str(mOrig[x][y]) for y in range(best[1], best[3])]))

Will output

-11 -7
-1  -9
-3  -5

In case something else is meant with "biggest submatrix", the only function that needs to get changed is the following:

def calculateSize(x1, y1, x2, y2):
    return (x2 - x1) * (y2 - y1)

which is calculating the size of a submatrix.

Edit 1 ... first speedup

best = (-1, -1, -1, -1, -1)
for x1 in range(len(m)):
    for y1 in range(len(m[0])):
        if m[x1][y1] == 1: # The starting point must contain a 1
            for x2 in range(x1 + 1, len(m)): # actually can start with x1 + 1 here
                for y2 in range(y1 + 1, len(m[0])):
                    if containsOnly1s(m, x1, y1, x2, y2):
                        sizeOfSolution = calculateSize(x1, y1, x2, y2)
                        if best[4] < sizeOfSolution:
                            best = (x1, y1, x2, y2, sizeOfSolution)
                    else:
                        # There is at least one 0 in the matrix, so every greater
                        # matrix will also contain this 0
                        break

Edit 2

Ok, after converting the matrix into a matrix of 0 and 1 (as I do via the line m = [[1 if x < 0 else 0 for x in z] for z in mOrig] the problem is the same as what is called the maximal rectangle problem in literature. So I googled a bit about known algorithms for this kind of problem and came across this site here http://www.drdobbs.com/database/the-maximal-rectangle-problem/184410529 which is describing a very fast algorithm to solve this kind of problem. To summarize the points of this website, the algorithm is exploiting the structure. This can be done by using a stack in order to remember the structure profile which allows us to recalculate the width in case a narrow rectangle gets reused when an wider one gets closed.

Upvotes: 2

Related Questions