lennard
lennard

Reputation: 563

Python string to a colour

I have a database of thousands of different colours. I want to map them to one of the colours I have in a list.

Before this database of colours was only a few hundred and I managed this with something like the code below. This is now getting unmaintainable as this database of unclassified colours is growing and takes me a lot of time every week to map.

How can I improve this or what would be a better approach?

mapped_colours = ['Red', 'Green', 'Yellow', 'Blue', 'White', 'Black', 'Pink', 'Purple'...]

colour_map_dict = {
    'olive': 'Green',
    'khaki': 'Green'
}

def classify_colour(colour):
    for mp in mapped_colours:
        if mp.lower() in colour.lower():
            return mp

    for map, colour in colour_map_dict.items():
        if map in colour.lower():
            return colour

Here is an example of the data coming in.

 Resin Dark Wash Indi
 Filtered Canyon
 999 Black
 Winter White/Dove Grey
 Midnight/min
 White & black
 Green/White
 Red/White
 Multicolor
 royal blue
 Black Plum Grey
 Rose/ Gold
 Red And White
 Offwht/Gg
 Black Gunmetal
 Berry/Black
 Caramel
 Blue Stone Bleached
 All Tan
 Pale Blush
 Tee
 White / Multi
 00-black
 Flat Foundation
 Baby Blue
 Beige Melange

Upvotes: 3

Views: 1059

Answers (2)

Wander Nauta
Wander Nauta

Reputation: 19695

Once you have a large database of names to correct answers (see Martijn's answer), you could use that database to train a classification algorithm, for example one from scikit-learn:

#!/usr/bin/env python3

from sklearn import svm
from sklearn.feature_extraction.text import CountVectorizer

mapped_colours = ['Red', 'Green', 'Yellow', 'Blue', 'White', 'Black', 'Pink', 'Purple']

colour_map = [
    ('olive', 'Green'),
    ('khaki', 'Green'),
    ('snow white', 'White'),
    ('alice white', 'White'),
    ('pale blush', 'Pink'),
    ('baby blue', 'Blue'),
    ('midnight', 'Blue'),
    # ...and so on and so on - you'll need a lot of these
]

# A classifier classifies inputs into categories (colors in this case)
clf = svm.SVC(gamma=0.001, C=100.)

# A vectorizer turns strings into arrays which can be used as input
vectorizer = CountVectorizer()

# Train both the classifier and the vectorizer. This can take some time.
training = vectorizer.fit_transform([k for (k, v) in colour_map])
clf.fit(training, [mapped_colours.index(v) for (k, v) in colour_map])

# Predict some colors!
while True:
    query = input('Enter a color: ')
    guess = clf.predict(vectorizer.transform([query]))[0]
    print('Maybe', mapped_colours[guess])

Example run:

Enter a color: snow
Maybe White
Enter a color: dark khaki
Maybe Green
Enter a color: baby bedroom
Maybe Blue

You could alternatively have your model try to predict a RGB color, if your input data is already in RGB form, and work form there.

Because of the very short input, the classifier will likely not get very smart, but if the database is large enough it could perhaps make the job of adding colors a bit easier: if the classifier guesses correctly, just add its guess as a color. If not, you will still need to manually classify it, but the classifier will pick up the correct answer in future runs.


Disclaimer: I'm not sure if SVC is a right fit (heh) for your problem, but it might be Good Enough and worth a try.

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1124548

I'd start with a decent colour dictionary to map names to colour definitions in a given colour space (like RGB or CMYK or HSV). There are various sets available on the internet; you'll have to do work up-front to obtain them and normalise the data from each to use the same colour space. The more sources your can obtain, the richer your mapping; you appear to have a load of fashion colours (paint? cloth?) in your input set, and (commercial) fashion is forever trying to differentiate by inventing new colour names.

Because a colour space is finite, you can then algorithmically partition that space into a limited set of groups. Each colour name then automatically will map to a given group.

Looking around a bit, a good starting point would be the Wikipedia lists of colour names. The compact list should be easily machine parseable, even in the basic HTML form, or you can use the MediaWiki API to get a raw format that's even easier to parse. Then perhaps add other standardised colour name dictionaries; the goal here is to get as many names as possible all mapping to the same colour space.

I'd store these names in a database table, and have a simple mathematical formula ready to divide the colour space into your basic groups. That way any colour in the table can be mapped to (say) RGB, and RGB to simple name.

Next, build a simple spell-checker trained on your database of names, and run your input through that first. You have some pretty hard-to-work-with data there, but a trained colour name spell checker can probably clean up Offwht/Gg to something that can be matched. And use the natural text search to find partial matches.

Note that if you have image data with those colour names you receive, you'd find the most prevalent colour in that image, and then you have another name (from your input data) -> colour space mapping to use.

Upvotes: 3

Related Questions