vagabond
vagabond

Reputation: 3594

create a new column based on conditional testing of regex in pandas

New to Python & Pandas.

I want to test if a string is in the column and create a new column if condition is satisfied with the string value.

For e.g.

I have a df :

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', 'lemon-merengue', 'strawberry-tart'], 'type':['cocktail', 'dessert', 'dessert', 'dessert']})

and I have two regex strings:

fruit = "apple|mango|banana|peach"

recipe = "cocktail|dessert|appetizer"

I want to meet the following conditions:

df['foodstuff'].str.contains(fruit, case = False) & (df['type'].str.contains(recipe, case = False))

In this case, the output would look like:

pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', 'lemon-merengue', 'strawberry-tart'], 'type':['cocktail', 'dessert', 'dessert', 'dessert'], 'tag':['apple', 'apple', np.nan, np.nan ]}) 

I was trying to do it like this:

df['tag'] = np.where(df['foodstuff'].str.contains(fruit), fruit, np.nan)

but in that case, the 'tag' column takes the entire string value: apple|mango|banana|peach . I need just the part that matched.

Upvotes: 3

Views: 2124

Answers (1)

jezrael
jezrael

Reputation: 863216

I think you need str.extract:

fruit = "apple|mango|banana|peach"
df['tag'] = df.foodstuff.str.extract('('+fruit+')', expand=False)
print (df)
         foodstuff      type    tag
0    apple-martini  cocktail  apple
1        apple-pie   dessert  apple
2   lemon-merengue   dessert    NaN
3  strawberry-tart   dessert    NaN

Upvotes: 3

Related Questions