Reputation: 859
I am trying to assign a column based on strings that may be contained in other columns. For example
var1 = 67
columns = {'col1': ['string1', 'thang2', 'code3', 'string2'],
'col2': [1, 2, np.nan, 3], 'col3': ['I', 'cant', 'think', 'what']}
df = pd.DataFrame(data = columns)
How do I then make a fourth column col4
that is col3 + var1 + col1
most of the time, but is np.nan
whenever col2
is nan
(in the same row) and has a -W
appended to its value whenever there is an 'in'
in any the string in col1
(again, in the same row)?
I know all about assign
, but I don't know how to do all that conditional stuff in the assign, or if there is a way to do it after creating the column, I'm not sure either.
Upvotes: 2
Views: 380
Reputation: 153460
You can try this using np.where
:
df['col4'] = np.where(df['col2'].notnull(),
df['col3'] + str(var1) + np.where(df['col1'].str.contains('in'),
df['col1'] + '-w',
df['col1']),
np.nan)
Output:
col1 col2 col3 col4
0 string1 1.0 I I67string1-w
1 thang2 2.0 cant cant67thang2
2 code3 NaN think NaN
3 string2 3.0 what what67string2-w
Or if you want to do it with assign
:
df.assign(col5 = np.where(df['col2'].notnull(),
df['col3'] + str(var1) + np.where(df['col1'].str.contains('in'),
df['col1'] + '-w',
df['col1']),
np.nan))
Output:
col1 col2 col3 col4 col5
0 string1 1.0 I I67string1-w I67string1-w
1 thang2 2.0 cant cant67thang2 cant67thang2
2 code3 NaN think NaN NaN
3 string2 3.0 what what67string2-w what67string2-w
Update: Since you mentioned speed. I think I'd remove the .str accessor and use list comprehension too.
df['col4'] = np.where(df['col2'].notnull(),
df['col3'] + str(var1) + np.where(['in' in i for i in df['col1']],
df['col1'] + '-w',
df['col1']),
np.nan)
Upvotes: 4