Reputation: 402
I need to fill cells in a column based off if another column contains a certain string.
I need to fill column B based off what's in C. Like if C contains 'hello;', then fill the corresponding cell in B with 'greet'. Then if C contains 'bye;', fill the corresponding cells in B with 'farewell'.
df1
A B C D
0 w hello; Jon q
1 x bye; Jon r
2 y hello; Jack s
3 z bye; Jack t
df1['B'] = np.where(df1['C'].str.contains('hello;'), 'greet', '')
df1['B'] = np.where(df1['C'].str.contains('bye;'), 'farewell', '')
This works; however, the next line of code overwrites the 'greet' from the first line. So I'm not sure how to combine the conditionals so they don't overwrite each other. What I want the end result to be is
df1
A B C D
0 w greet hello; Jon q
1 x farewell bye; Jon r
2 y greet hello; Jack s
3 z farewell bye; Jack t
Upvotes: 1
Views: 763
Reputation: 1271
If you’re only going to be dealing with a binary choice and all the values exist in the column, as per the example, then this should be fine:
df1['B'] = np.where(df1['C'].str.contains('bye;'), 'farewell', 'greet')
From the numpy docs:
numpy.where(condition[, x, y])
Return elements chosen from x or y depending on condition.
If the condition is satisfied, it will return x
, else it will fill with y
.
However, np.select
docs will be the one you want if you have more than one condition:
conditions = [
df['C'].str.contains('hello;'),
df['C'].str.contains('bye;')
]
np.select(conditions, ['greet', 'farewell'])
array(['greet', 'farewell', 'greet', 'farewell'], dtype='<U11')
Upvotes: 2
Reputation: 3739
try using np.select
m1= df['C'].str.contains('hello;')
m2= df['C'].str.contains('bye;')
df['B'] = np.select(condlist=[m1 , m2],
choicelist=['greet','farewell'])
Upvotes: 1