Reputation: 87
I'm trying to iterate through a dataframe column to extract a certain set of words. I'm mapping these as key value pairs in a dictionary and have with some help managed to set on key per row so far.
Now, what I would like to do is return multiple keys in the same row if the values are present in the string and these should be separated by a |
(pipe).
Code:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Red and Blue Lace Midi Dress', 'Long Armed Sweater Azure and Ruby',
'High Top Ruby Sneakers', 'Tight Indigo Jeans',
'T-Shirt Navy and Rose']})
colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'indigo', 'navy')}
def fetchColours(x):
for key, values in colour.items():
for value in values:
if value in x.lower():
return key
else:
return np.nan
df['Colour'] = df['Name'].apply(fetchColours)
Output:
Name Colour
0 Red and Blue Lace Midi Dress red
1 Long Armed Sweater Azure and Ruby blue
2 High Top Ruby Sneakers red
3 Tight Indigo Jeans blue
4 T-Shirt Navy and Rose blue
Expected result:
Name Colour
0 Red and Blue Lace Midi Dress red
1 Long Armed Sweater Azure and Ruby blue|red
2 High Top Ruby Sneakers red
3 Tight Indigo Jeans blue
4 T-Shirt Navy and Rose blue|red
Upvotes: 0
Views: 178
Reputation: 3648
The problem is that you return directly after finding a key, while you should continue searching untill all results are found:
def fetchColours(x):
keys = []
for key, values in colour.items():
for value in values:
if value in x.lower():
keys.append(key)
if len(keys) != 0:
return '|'.join(keys)
else:
return np.nan
For this to work you have to change:
colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'indigo', 'navy')}
to
colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'blue','indigo', 'navy')}
Because otherwise it won't search for the term 'blue' in each sentence, meaning it cannot add this key to the list in the first example.
Upvotes: 2
Reputation: 77
How about this:
def fetchColors(x):
color_keys = []
for key, values in color.items():
for value in values:
if value in x.lower():
color_keys.append(key)
if color_keys:
return '|'.join(color_keys)
else:
return np.nan
Upvotes: 0