Reputation: 3510
I'm working with a page segmentation algorithm. The output of the code writes an image with the pixels of each zone assigned a unique color. I'd like to process the image to find the bounding boxes of the zones. I need to find all the colors, then find all the pixels of that color, then find their bounding box.
The following is an example image.
I'm currently starting with histograms of the R,G,B channels. The histograms tell me data locations.
img = Image.open(imgfilename)
img.load()
r,g,b = img.split()
ra,ga,ba = [ np.asarray(p,dtype="uint8") for p in (r,g,b) ]
rhist,edges = np.histogram(ra,bins=256)
ghist,edges = np.histogram(ga,bins=256)
bhist,edges = np.histogram(ba,bins=256)
print np.nonzero(rhist)
print np.nonzero(ghist)
print np.nonzero(bhist)
Output: (array([ 0, 1, 128, 205, 255]),) (array([ 0, 20, 128, 186, 255]),) (array([ 0, 128, 147, 150, 255]),)
I'm a little flummoxed at this point. By visual inspection, I have colors (0,0,0),(1,0,0),(0,20,0),(128,128,128),etc. How should I permute the nonzero outputs into pixel values for np.where()?
I'm considering flattening the 3,row,col narray into a 2-D plane of 24-bit packed RGB values (r<<24|g<<16|b) and searching that array. That seems brute force and inelegant. Is there a better way in Numpy to find bounding boxes of a color value?
Upvotes: 3
Views: 3429
Reputation: 19241
There is no reason to consider this as a RGB color image, it is simply a visualization of a segmentation that someone else did. You can easily consider it as a grayscale image, and for these specific colors you don't have to do anything else yourself.
import sys
import numpy
from PIL import Image
img = Image.open(sys.argv[1]).convert('L')
im = numpy.array(img)
colors = set(numpy.unique(im))
colors.remove(255)
for color in colors:
py, px = numpy.where(im == color)
print(px.min(), py.min(), px.max(), py.max())
If you cannot rely on convert('L')
giving unique colors (i.e., you are using other colors beyond the ones in the given image), you can pack your image and obtain the unique colors:
...
im = numpy.array(img, dtype=int)
packed = im[:,:,0]<<16 | im[:,:,1]<<8 | im[:,:,2]
colors = set(numpy.unique(packed.ravel()))
colors.remove(255<<16 | 255<<8 | 255)
for color in colors:
py, px = numpy.where(packed == color)
print(px.min(), py.min(), px.max(), py.max())
I would also recommend removing small connected components before finding the bounding boxes, by the way.
Upvotes: 4
Reputation: 67457
EDIT Putting all together into a working program, using the image you posted:
from __future__ import division
import numpy as np
import itertools
from PIL import Image
img = np.array(Image.open('test_img.png'))
def bounding_boxes(img) :
r, g, b = [np.unique(img[..., j]) for j in (0, 1, 2)]
bounding_boxes = {}
for r0, g0, b0 in itertools.product(r, g, b) :
rows, cols = np.where((img[..., 0] == r0) &
(img[..., 1] == g0) &
(img[..., 2] == b0))
if len(rows) :
bounding_boxes[(r0, g0, b0)] = (np.min(rows), np.max(rows),
np.min(cols), np.max(cols))
return bounding_boxes
In [2]: %timeit bounding_boxes(img)
1 loops, best of 3: 30.3 s per loop
In [3]: bounding_boxes(img)
Out[3]:
{(0, 0, 255): (3011, 3176, 755, 2546),
(0, 128, 0): (10, 2612, 0, 561),
(0, 128, 128): (1929, 1972, 985, 1438),
(0, 255, 0): (10, 166, 562, 868),
(0, 255, 255): (2938, 2938, 680, 682),
(1, 0, 0): (10, 357, 987, 2591),
(128, 0, 128): (417, 1873, 984, 2496),
(205, 186, 150): (11, 56, 869, 1752),
(255, 0, 0): (3214, 3223, 570, 583),
(255, 20, 147): (2020, 2615, 956, 2371),
(255, 255, 0): (3007, 3013, 600, 752),
(255, 255, 255): (0, 3299, 0, 2591)}
Not very fast, even with the small number of colors being actually checked...
You can find the bounding box for color r0
, g0
, b0
with something along the lines of
rows, cols = np.where((ra == r0) & (ga == g0) & (ba == b0))
top, bottom = np.min(rows), np.max(rows)
left, right = np.min(cols), np.max(cols)
Rather than iterating through all 2**24
combinations of RGB colors, you can vastly reduce your search space using only the cartesian product of your non-zero histogram bins:
for r0, g0, b0 in itertools.product(np.nonzero(rhist),
np.nonzero(ghist),
np.nonzero(bhist)) :
You will have non-existing combinations leaking in, that you can filter out checking that rows
and cols
are not empty tuples. But in your example you would have reduced a search space of 2**24
combinations to just 125.
Upvotes: 2
Reputation: 20126
This is just a solution off the top of my head. You can iterate over the pixels in the image from say top-left to bottom-right and save a top
, bottom
, left
and right
values for each color. For a given color the top
value will be the first row you see with this color and the bottom
will be the last raw, the left
value will be the minimal column value for pixels in this color and the right
is the maximal column value you find.
Then, for each color you can draw a rectangle in from top-left
to bottom-right
in the desired color.
I don't know if this is qualified as a good bounding box algorithm, but I guess it's ok.
Upvotes: 0