David Poole
David Poole

Reputation: 3510

Finding bounding boxes of RGB colors in image

I'm working with a page segmentation algorithm. The output of the code writes an image with the pixels of each zone assigned a unique color. I'd like to process the image to find the bounding boxes of the zones. I need to find all the colors, then find all the pixels of that color, then find their bounding box.

The following is an example image.

Example output image showing colored zones

I'm currently starting with histograms of the R,G,B channels. The histograms tell me data locations.

img = Image.open(imgfilename)
img.load()
r,g,b = img.split()

ra,ga,ba = [ np.asarray(p,dtype="uint8") for p in (r,g,b) ]

rhist,edges = np.histogram(ra,bins=256)
ghist,edges = np.histogram(ga,bins=256)
bhist,edges = np.histogram(ba,bins=256)
print np.nonzero(rhist)
print np.nonzero(ghist)
print np.nonzero(bhist)

Output: (array([ 0, 1, 128, 205, 255]),) (array([ 0, 20, 128, 186, 255]),) (array([ 0, 128, 147, 150, 255]),)

I'm a little flummoxed at this point. By visual inspection, I have colors (0,0,0),(1,0,0),(0,20,0),(128,128,128),etc. How should I permute the nonzero outputs into pixel values for np.where()?

I'm considering flattening the 3,row,col narray into a 2-D plane of 24-bit packed RGB values (r<<24|g<<16|b) and searching that array. That seems brute force and inelegant. Is there a better way in Numpy to find bounding boxes of a color value?

Upvotes: 3

Views: 3429

Answers (3)

mmgp
mmgp

Reputation: 19241

There is no reason to consider this as a RGB color image, it is simply a visualization of a segmentation that someone else did. You can easily consider it as a grayscale image, and for these specific colors you don't have to do anything else yourself.

import sys
import numpy
from PIL import Image

img = Image.open(sys.argv[1]).convert('L')

im = numpy.array(img) 
colors = set(numpy.unique(im))
colors.remove(255)

for color in colors:
    py, px = numpy.where(im == color)
    print(px.min(), py.min(), px.max(), py.max())

If you cannot rely on convert('L') giving unique colors (i.e., you are using other colors beyond the ones in the given image), you can pack your image and obtain the unique colors:

...
im = numpy.array(img, dtype=int)

packed = im[:,:,0]<<16 | im[:,:,1]<<8 | im[:,:,2]
colors = set(numpy.unique(packed.ravel()))
colors.remove(255<<16 | 255<<8 | 255)

for color in colors:
    py, px = numpy.where(packed == color)
    print(px.min(), py.min(), px.max(), py.max())

I would also recommend removing small connected components before finding the bounding boxes, by the way.

Upvotes: 4

Jaime
Jaime

Reputation: 67457

EDIT Putting all together into a working program, using the image you posted:

from __future__ import division
import numpy as np
import itertools
from PIL import Image

img = np.array(Image.open('test_img.png'))

def bounding_boxes(img) :
    r, g, b = [np.unique(img[..., j]) for j in (0, 1, 2)]
    bounding_boxes = {}
    for r0, g0, b0 in itertools.product(r, g, b) :
        rows, cols = np.where((img[..., 0] == r0) &
                              (img[..., 1] == g0) &
                              (img[..., 2] == b0))
        if len(rows) :
            bounding_boxes[(r0, g0, b0)] = (np.min(rows), np.max(rows),
                                            np.min(cols), np.max(cols))
    return bounding_boxes

In [2]: %timeit bounding_boxes(img)
1 loops, best of 3: 30.3 s per loop

In [3]: bounding_boxes(img)
Out[3]: 
{(0, 0, 255): (3011, 3176, 755, 2546),
 (0, 128, 0): (10, 2612, 0, 561),
 (0, 128, 128): (1929, 1972, 985, 1438),
 (0, 255, 0): (10, 166, 562, 868),
 (0, 255, 255): (2938, 2938, 680, 682),
 (1, 0, 0): (10, 357, 987, 2591),
 (128, 0, 128): (417, 1873, 984, 2496),
 (205, 186, 150): (11, 56, 869, 1752),
 (255, 0, 0): (3214, 3223, 570, 583),
 (255, 20, 147): (2020, 2615, 956, 2371),
 (255, 255, 0): (3007, 3013, 600, 752),
 (255, 255, 255): (0, 3299, 0, 2591)}

Not very fast, even with the small number of colors being actually checked...


You can find the bounding box for color r0, g0, b0 with something along the lines of

rows, cols = np.where((ra == r0) & (ga == g0) & (ba == b0))
top, bottom = np.min(rows), np.max(rows)
left, right = np.min(cols), np.max(cols)

Rather than iterating through all 2**24 combinations of RGB colors, you can vastly reduce your search space using only the cartesian product of your non-zero histogram bins:

for r0, g0, b0 in itertools.product(np.nonzero(rhist),
                                    np.nonzero(ghist),
                                    np.nonzero(bhist)) :

You will have non-existing combinations leaking in, that you can filter out checking that rows and cols are not empty tuples. But in your example you would have reduced a search space of 2**24combinations to just 125.

Upvotes: 2

zenpoy
zenpoy

Reputation: 20126

This is just a solution off the top of my head. You can iterate over the pixels in the image from say top-left to bottom-right and save a top, bottom, left and right values for each color. For a given color the top value will be the first row you see with this color and the bottom will be the last raw, the left value will be the minimal column value for pixels in this color and the right is the maximal column value you find.

Then, for each color you can draw a rectangle in from top-left to bottom-right in the desired color.

I don't know if this is qualified as a good bounding box algorithm, but I guess it's ok.

Upvotes: 0

Related Questions