Reputation: 117
I want find the biggest submatrix which contains only negative numbers in a matrix, for example:
In
[[1, -9, -2, 8, 6, 1],
[8, -1,-11, -7, 6, 4],
[10, 12, -1, -9, -12, 14],
[8, 10, -3, -5, 17, 8],
[6, 4, 10, -13, -16, 19]]
the biggest submatrix containing only negative numbers is
[[-11, -7],
[-1, -9],
[-3,-5]]
(left upper corner coordinates: 1,2, right lower corner coordinates: 3,3).
What's the most effective way to do it?
Upvotes: 2
Views: 371
Reputation: 701
Below is a function that executes in much less than a second for a 5000x5000 matrix. It relies only on basic np-functions.
It could be improved by returning the first index instead of all indices. Several other optimizations can be done but for many uses it is fast enough.
from numpy import roll, where
def gidx(X):
Wl = X & roll(X, 1, axis=1)
T = X & Wl & roll(X, -1, axis=1)
if T[1:-1][1:-1].any():
N = T & roll(T, -1, axis=0) & roll(T, 1, axis=0)
if N.any(): return gidx(N)
W = Wl & roll(Wl, 1, axis=0)
if W.any(): return where(W)
return where(X)
#%% Example
import numpy as np
#np.random.seed(0)
M = 100
X = np.random.randn(M, M) - 2
X0 = (X < 0)
X0[[0, -1]], X0[:, [0, -1]] = False, False
jx, kx = gidx(X0)
Upvotes: 0
Reputation: 31
Here is my pretty fast solution using convolutions from OpenCV. Usage of float32 is required as it is much faster than integers. Takes 135 ms for 1000 x 1000 matrix on my 2 cores. There's space for further code optimization though.
import cv2
import numpy as np
data = """1 -9 -2 8 6 1
8 -1 -11 -7 6 4
10 12 -1 -9 -12 14
8 10 -3 -5 17 8
6 4 10 -13 -16 19"""
# matrix = np.random.randint(-128, 128, (1000, 1000), dtype=np.int32)
matrix = np.int32([line.split() for line in data.splitlines()])
def find_max_kernel(matrix, border=cv2.BORDER_ISOLATED):
max_area = 0
mask = np.float32(matrix < 0)
ones = np.ones_like(mask)
conv_x = np.zeros_like(mask)
conv_y = np.zeros_like(mask)
max_h, max_w = mask.shape
for h in range(1, max_h + 1):
cv2.filter2D(mask, -1, ones[:h, None, 0], conv_y, (0, 0), 0, border)
for w in range(1, max_w + 1):
area = h * w
if area > max_area:
cv2.filter2D(conv_y, -1, ones[None, 0, :w], conv_x, (0, 0), 0, border)
if conv_x.max() == area:
max_area, shape = area, (h, w)
else:
if w == 1:
max_h = h - 1
if h == 1:
max_w = w - 1
break
if h >= max_h:
break
cv2.filter2D(mask, -1, np.ones(shape, np.float32), conv_x, (0, 0), 0, border)
p1 = np.array(np.unravel_index(conv_x.argmax(), conv_x.shape))
p2 = p1 + shape - 1
return p1, p2
print(*find_max_kernel(matrix), sep='\n')
Upvotes: 2
Reputation: 2224
A brute force solution. Will work, but may be considered too slow for a bigger matrix:
mOrig = [[1, -9, -2, 8, 6, 1],
[8, -1,-11, -7, 6, 4],
[10, 12, -1, -9, -12, 14],
[8, 10, -3, -5, 17, 8],
[6, 4, 10, -13, -16, 19]]
# reduce the problem
# now we have a matrix that contains only 0 and 1
# at the place where there was a negative number
# there is now a 1 and at the places where a positive
# number had been there is now a 0. 0s are considered
# to be negative numbers, if you want to change this,
# change the x < 0 to x <= 0.
m = [[1 if x < 0 else 0 for x in z] for z in mOrig]
# now we have the problem to find the biggest submatrix
# consisting only 1s.
# first a function that checks if a submatrix only contains 1s
def containsOnly1s(m, x1, y1, x2, y2):
for i in range(x1, x2):
for j in range(y1, y2):
if m[i][j] == 0:
return False
return True
def calculateSize(x1, y1, x2, y2):
return (x2 - x1) * (y2 - y1)
best = (-1, -1, -1, -1, -1)
for x1 in range(len(m)):
for y1 in range(len(m[0])):
for x2 in range(x1, len(m)):
for y2 in range(y1, len(m[0])):
if containsOnly1s(m, x1, y1, x2, y2):
sizeOfSolution = calculateSize(x1, y1, x2, y2)
if best[4] < sizeOfSolution:
best = (x1, y1, x2, y2, sizeOfSolution)
for x in range(best[0], best[2]):
print("\t".join([str(mOrig[x][y]) for y in range(best[1], best[3])]))
Will output
-11 -7
-1 -9
-3 -5
In case something else is meant with "biggest submatrix", the only function that needs to get changed is the following:
def calculateSize(x1, y1, x2, y2):
return (x2 - x1) * (y2 - y1)
which is calculating the size of a submatrix.
Edit 1 ... first speedup
best = (-1, -1, -1, -1, -1)
for x1 in range(len(m)):
for y1 in range(len(m[0])):
if m[x1][y1] == 1: # The starting point must contain a 1
for x2 in range(x1 + 1, len(m)): # actually can start with x1 + 1 here
for y2 in range(y1 + 1, len(m[0])):
if containsOnly1s(m, x1, y1, x2, y2):
sizeOfSolution = calculateSize(x1, y1, x2, y2)
if best[4] < sizeOfSolution:
best = (x1, y1, x2, y2, sizeOfSolution)
else:
# There is at least one 0 in the matrix, so every greater
# matrix will also contain this 0
break
Edit 2
Ok, after converting the matrix into a matrix of 0 and 1 (as I do via the line m = [[1 if x < 0 else 0 for x in z] for z in mOrig]
the problem is the same as what is called the maximal rectangle problem
in literature. So I googled a bit about known algorithms for this kind of problem and came across this site here http://www.drdobbs.com/database/the-maximal-rectangle-problem/184410529 which is describing a very fast algorithm to solve this kind of problem. To summarize the points of this website, the algorithm is exploiting the structure. This can be done by using a stack in order to remember the structure profile which allows us to recalculate the width in case a narrow rectangle gets reused when an wider one gets closed.
Upvotes: 2