Python string to a colour

Question

I have a database of thousands of different colours. I want to map them to one of the colours I have in a list.

Before this database of colours was only a few hundred and I managed this with something like the code below. This is now getting unmaintainable as this database of unclassified colours is growing and takes me a lot of time every week to map.

How can I improve this or what would be a better approach?

mapped_colours = ['Red', 'Green', 'Yellow', 'Blue', 'White', 'Black', 'Pink', 'Purple'...]

colour_map_dict = {
    'olive': 'Green',
    'khaki': 'Green'
}

def classify_colour(colour):
    for mp in mapped_colours:
        if mp.lower() in colour.lower():
            return mp

    for map, colour in colour_map_dict.items():
        if map in colour.lower():
            return colour

Here is an example of the data coming in.

 Resin Dark Wash Indi
 Filtered Canyon
 999 Black
 Winter White/Dove Grey
 Midnight/min
 White & black
 Green/White
 Red/White
 Multicolor
 royal blue
 Black Plum Grey
 Rose/ Gold
 Red And White
 Offwht/Gg
 Black Gunmetal
 Berry/Black
 Caramel
 Blue Stone Bleached
 All Tan
 Pale Blush
 Tee
 White / Multi
 00-black
 Flat Foundation
 Baby Blue
 Beige Melange

Martijn Pieters · Accepted Answer

I'd start with a decent colour dictionary to map names to colour definitions in a given colour space (like RGB or CMYK or HSV). There are various sets available on the internet; you'll have to do work up-front to obtain them and normalise the data from each to use the same colour space. The more sources your can obtain, the richer your mapping; you appear to have a load of fashion colours (paint? cloth?) in your input set, and (commercial) fashion is forever trying to differentiate by inventing new colour names.

Because a colour space is finite, you can then algorithmically partition that space into a limited set of groups. Each colour name then automatically will map to a given group.

Looking around a bit, a good starting point would be the Wikipedia lists of colour names. The compact list should be easily machine parseable, even in the basic HTML form, or you can use the MediaWiki API to get a raw format that's even easier to parse. Then perhaps add other standardised colour name dictionaries; the goal here is to get as many names as possible all mapping to the same colour space.

I'd store these names in a database table, and have a simple mathematical formula ready to divide the colour space into your basic groups. That way any colour in the table can be mapped to (say) RGB, and RGB to simple name.

Next, build a simple spell-checker trained on your database of names, and run your input through that first. You have some pretty hard-to-work-with data there, but a trained colour name spell checker can probably clean up Offwht/Gg to something that can be matched. And use the natural text search to find partial matches.

Note that if you have image data with those colour names you receive, you'd find the most prevalent colour in that image, and then you have another name (from your input data) -> colour space mapping to use.

Python string to a colour

Answers (2)

Related Questions