Create new Python DataFrame column based on conditions of multiple other columns

Question

I'm trying to create a new DataFrame column (Column C) based on the inputs of two other columns. The two criteria I have is if either "Column A is > 0" OR "Column B contains the string "Apple",* then Column C should have the value of "Yes" otherwise it should have the value of "No"

*Bonus points if answer is not case-sensitive (that is, it'll pick up the "apple" in "Pineapple" as well as in "Apple Juice"

Data might look like (and what Column C should result in)

Column_A Column_B           Column_C  
23       Orange Juice       Yes  
2        Banana Smoothie    Yes  
8        Pineapple Juice    Yes  
0        Pineapple Smoothie Yes  
0        Apple Juice        Yes  
0        Lemonade           No  
34       Coconut Water      Yes

I've tried several things, including:

df['Keep6']= np.where((df['Column_A'] >0) | (df['Column_B'].find('Apple')>0) , 'Yes','No')

But get the error message: "AttributeError: 'Series' object has no attribute 'find'"

ansev · Accepted Answer

Use Series.str.contains with case=False to not case-sensitive:

df['Column_C']= np.where((df['Column_A']>0) | (df['Column_B'].str.contains('apple', case=False)) ,'Yes','No')
print(df)

   Column_A            Column_B Column_C
0        23        Orange_Juice      Yes
1         2     Banana_Smoothie      Yes
2         8     Pineapple_Juice      Yes
3         0  Pineapple_Smoothie      Yes
4         0         Apple_Juice      Yes
5         0            Lemonade       No
6        34       Coconut_Water      Yes

Create new Python DataFrame column based on conditions of multiple other columns

Answers (2)

Related Questions