Reputation: 71
I'm trying to create a new DataFrame column (Column C) based on the inputs of two other columns. The two criteria I have is if either "Column A is > 0" OR "Column B contains the string "Apple",* then Column C should have the value of "Yes" otherwise it should have the value of "No"
*Bonus points if answer is not case-sensitive (that is, it'll pick up the "apple" in "Pineapple" as well as in "Apple Juice"
Data might look like (and what Column C should result in)
Column_A Column_B Column_C
23 Orange Juice Yes
2 Banana Smoothie Yes
8 Pineapple Juice Yes
0 Pineapple Smoothie Yes
0 Apple Juice Yes
0 Lemonade No
34 Coconut Water Yes
I've tried several things, including:
df['Keep6']= np.where((df['Column_A'] >0) | (df['Column_B'].find('Apple')>0) , 'Yes','No')
But get the error message: "AttributeError: 'Series' object has no attribute 'find'"
Upvotes: 1
Views: 143
Reputation: 4497
Try this code, using pandas.Dataframe.apply function:
df['Column_C'] = df.apply(lambda row: 'Yes' if (row['Column_A']>0) | (row['Column_B'].lower().find('apple')>=0) else 'No', axis=1)
and gives:
Column_A Column_B Column_C
0 23 Orange Juice Yes
1 2 Banana Smoothie Yes
2 8 Pineapple Juice Yes
3 0 Pineapple Smoothie Yes
4 0 Apple Juice Yes
5 0 Lemonade No
6 34 Coconut Water Yes
Upvotes: 0
Reputation: 30940
Use Series.str.contains with case=False
to not case-sensitive:
df['Column_C']= np.where((df['Column_A']>0) | (df['Column_B'].str.contains('apple', case=False)) ,'Yes','No')
print(df)
Column_A Column_B Column_C
0 23 Orange_Juice Yes
1 2 Banana_Smoothie Yes
2 8 Pineapple_Juice Yes
3 0 Pineapple_Smoothie Yes
4 0 Apple_Juice Yes
5 0 Lemonade No
6 34 Coconut_Water Yes
Upvotes: 1