Reputation: 179
I need to add a new column to a Pandas dataframe.
If the column "Inducing" contains text (not empty and not "") I need to add a 1 otherwise 0
I tried with
df['newColumn'] = np.where(df['INDUCING']!="", 1, 0)
This command works only for the values that are Strings initiated as "" but does not work if it is null.
Any idea on how to add this column correctly?
Upvotes: 0
Views: 914
Reputation: 18211
As the built-in bool
produces True
on a string exactly if it is non-empty, you can achieve this simply through
df['newColumn'] = df['INDUCING'].astype(bool).astype(int)
Some performance comparisons:
In [61]: df = pd.DataFrame({'INDUCING': ['test', None, '', 'more test']*10000})
In [63]: %timeit np.where(df['INDUCING'].fillna('') != "", 1, 0)
5.68 ms ± 500 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [62]: %timeit (df['INDUCING'].ne('') & df['INDUCING'].notnull()).astype(int)
5.1 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [64]: %timeit np.where(df['INDUCING'], 1, 0)
667 µs ± 25.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [65]: %timeit df['INDUCING'].astype(bool).astype(int)
655 µs ± 5.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [99]: %timeit df['INDUCING'].values.astype(bool).astype(int)
553 µs ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 0
Reputation: 164753
By De Morgan's laws, NOT(cond1 OR cond2) is equivalent to AND(NOT(cond1) AND NOT(cond2)).
You can combine conditions via the bitwise "and" (&
) / "or" (|
) operators as appropriate. This gives a Boolean series, which you can then cast to int
:
df['newColumn'] = (df['INDUCING'].ne('') & df['INDUCING'].notnull()).astype(int)
Upvotes: 2
Reputation: 18916
Easiest way would be to .fillna('')
first. Correction:
df['newColumn'] = np.where(df['INDUCING'].fillna('') != "", 1, 0)
or pass .astype(int) directly to the mask. This converts True to 1 and False to 0:
df['newcol'] = (df['INDUCING'].fillna('') != '').astype(int)
Upvotes: 1