Reputation: 6815
We have a database with more than 250,000 images and we'd like to search these by color - similar to how Google's search by color works. So, we'd define 12 different colors: from black over red, green and blue to white. If the user selects for example red, we'd like to return all images that contain well visible "reddish parts". By "reddish arts" I mean anything a color range from deep red to maybe slightly purple.
The plan was to take an an images, scale it down to 64x64 px and work with the HSL values of all pixels. That's how we intended to calculate the different color ranges:
from PIL import Image
import colorsys
image = Image.open('test.jpg').convert('RGBA').resize((64, 64), Image.ANTIALIAS)
red, orange, yellow, green, turquoise, blue, lilac, pink, white, gray, black, brown = 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
for px in image.getdata():
h, s, l = colorsys.rgb_to_hsv(px[0]/255., px[1]/255., px[2]/255.)
h = h * 360
s = s * 100
l = l * 100
if l > 95:
white += 1
elif l < 8:
black += 1
elif s < 8:
gray += 1
elif h < 12 or h > 349:
red += 1
elif h > 11 and h < 35:
if s > 70:
orange += 1
else:
brown += 1
elif h > 34 and h < 65:
yellow += 1
elif h > 64 and h < 150:
green += 1
elif h > 149 and h < 200:
turquoise += 1
elif h > 195 and h < 250:
blue += 1
elif h > 245 and h < 275:
lilac += 1
elif h > 274 and h < 350:
pink += 1
print 'White:', white
print 'Black:', black
print 'Gray:', gray
print 'Red:', red
print 'Orange:', orange
print 'Brown:', brown
print 'Yellow:', yellow
print 'Green:', green
print 'Turquoise:', turquoise
print 'Blue:', blue
print 'Lilac:', lilac
print 'Pink:', pink
It works rather nicely with some images, and fails horribly with others. The problem is: the perceived colors does not only depend on the hue value, but also in brightness and saturation. E.g. our definition of yellow fails completely for lower values of saturation/brightness -> it simply turns green-brownish and has nothing to do with yellow any more. But this is just one special case; brown turned our to be a sub-hue-value of orange ... when looking at the whole picture, this system seems to become really complex.
I think I'm doing something wrong here. Tried with RGB values and failed, too. Tried to figure out a better way with histograms, but failed due to dumbness or something ...
Orange, red, blue, etc. can also be booleans ... anything we can use in our database for retrieving search results ... I'm trying to work with native Python libs + Pillow and would prefer not to use scipy or numpy or any other third party app, unless really necessary. I've looked at a lot of similar SO questions, but none were of help. Most answers I found to this problem were without useful example code.
Help! :-)
Upvotes: 4
Views: 2011