Bob Harris
Bob Harris

Reputation: 87

How to return multiple keys in a string if a given string matches the keys value in a dictionary

I'm trying to iterate through a dataframe column to extract a certain set of words. I'm mapping these as key value pairs in a dictionary and have with some help managed to set on key per row so far.

Now, what I would like to do is return multiple keys in the same row if the values are present in the string and these should be separated by a | (pipe).

Code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Name': ['Red and Blue Lace Midi Dress', 'Long Armed Sweater Azure and Ruby',
                            'High Top Ruby Sneakers', 'Tight Indigo Jeans',
                            'T-Shirt Navy and Rose']})

colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'indigo', 'navy')}

def fetchColours(x):
    for key, values in colour.items():
        for value in values:
            if value in x.lower():
                return key
    else:
        return np.nan

df['Colour'] = df['Name'].apply(fetchColours)

Output:

        Name                            Colour
0       Red and Blue Lace Midi Dress    red
1  Long Armed Sweater Azure and Ruby    blue
2    High Top Ruby Sneakers             red
3        Tight Indigo Jeans             blue
4              T-Shirt Navy and Rose    blue

Expected result:

        Name                                 Colour
0       Red and Blue Lace Midi Dress         red
1       Long Armed Sweater Azure and Ruby    blue|red
2       High Top Ruby Sneakers               red
3       Tight Indigo Jeans                   blue
4       T-Shirt Navy and Rose                blue|red

Upvotes: 0

Views: 178

Answers (2)

Nathan
Nathan

Reputation: 3648

The problem is that you return directly after finding a key, while you should continue searching untill all results are found:

def fetchColours(x):
    keys = []
    for key, values in colour.items():
        for value in values:
            if value in x.lower():
                keys.append(key)
    if len(keys) != 0:
        return '|'.join(keys)
    else:
        return np.nan   

For this to work you have to change:

 colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'indigo', 'navy')}

to

 colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'blue','indigo', 'navy')}

Because otherwise it won't search for the term 'blue' in each sentence, meaning it cannot add this key to the list in the first example.

Upvotes: 2

Yoav Gaudin
Yoav Gaudin

Reputation: 77

How about this:

def fetchColors(x):
    color_keys = []
    for key, values in color.items():
        for value in values:
            if value in x.lower():
                color_keys.append(key)
    if color_keys:
        return '|'.join(color_keys)
    else:
        return np.nan

Upvotes: 0

Related Questions