Reputation: 57
I have a pandas dataframe where one column contains only strings.
df= pd.DataFrame(
{
"A": [2,4,7,17,39],
"B": ["apple","apple","broccoli","rose","apple"]
}
)
I want to examine this column "B" and look for all the times a string contains a certain part of a word. Then I create a new column "C" that contains a series of strings that say "fruit" whenever "app" is in the row,"flower" for whenever "ros" shows up and vegetable for whenever "brocc" shows up.
The final dataframe will look like:
df= pd.DataFrame(
{
"A": [2,4,7,17,39],
"B": ["apple","apple","broccoli","rose","apple"],
"C": ["fruit","fruit", "vegetable", "flower", "fruit"]
}
)
Upvotes: 2
Views: 268
Reputation: 152647
You could use a dictionary as converter an it's get
method as input for apply
:
converter = {'apple': 'fruit',
'broccoli': 'veg',
'rose': 'flower'}
df['C'] = df['B'].apply(converter.get)
print(df)
A B C
0 2 apple fruit
1 4 apple fruit
2 7 broccoli veg
3 17 rose flower
4 39 apple fruit
In case of the partial matching you would need to change this a little bit:
converter = {'app': 'fruit',
'brocc': 'vegetable',
'ros': 'flower'}
df['C'] = df['B'].apply(lambda original: next(val for key, val in converter.items() if key in original))
print(df)
A B C
0 2 apple fruit
1 4 apple fruit
2 7 broccoli vegetable
3 17 rose flower
4 39 apple fruit
The next(val for key, val in converter.items() if key in original)
will give you the value from the dictionary for the first found key that is in the "row".
Upvotes: 2
Reputation:
Create a dictionary
d = {'apple': 'fruit', 'broccoli': 'vegetable', 'rose': 'flower'}
And use it in map
or replace
(map is faster for this):
df['B'].map(d)
Out:
0 fruit
1 fruit
2 vegetable
3 flower
4 fruit
Name: B, dtype: object
df['B'].replace(d)
Out:
0 fruit
1 fruit
2 vegetable
3 flower
4 fruit
Name: B, dtype: object
Upvotes: 3