Reputation: 3594
New to Python & Pandas.
I want to test if a string is in the column and create a new column if condition is satisfied with the string value.
For e.g.
I have a df :
df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', 'lemon-merengue', 'strawberry-tart'], 'type':['cocktail', 'dessert', 'dessert', 'dessert']})
and I have two regex strings:
fruit = "apple|mango|banana|peach"
recipe = "cocktail|dessert|appetizer"
I want to meet the following conditions:
df['foodstuff'].str.contains(fruit, case = False) & (df['type'].str.contains(recipe, case = False))
In this case, the output would look like:
pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', 'lemon-merengue', 'strawberry-tart'], 'type':['cocktail', 'dessert', 'dessert', 'dessert'], 'tag':['apple', 'apple', np.nan, np.nan ]})
I was trying to do it like this:
df['tag'] = np.where(df['foodstuff'].str.contains(fruit), fruit, np.nan)
but in that case, the 'tag' column takes the entire string value:
apple|mango|banana|peach
. I need just the part that matched.
Upvotes: 3
Views: 2124
Reputation: 863216
I think you need str.extract
:
fruit = "apple|mango|banana|peach"
df['tag'] = df.foodstuff.str.extract('('+fruit+')', expand=False)
print (df)
foodstuff type tag
0 apple-martini cocktail apple
1 apple-pie dessert apple
2 lemon-merengue dessert NaN
3 strawberry-tart dessert NaN
Upvotes: 3