Botacco
Botacco

Reputation: 179

Create a new Pandas df column with boolean values that depend on another column

I need to add a new column to a Pandas dataframe.

If the column "Inducing" contains text (not empty and not "") I need to add a 1 otherwise 0

I tried with

df['newColumn'] = np.where(df['INDUCING']!="", 1, 0)

This command works only for the values that are Strings initiated as "" but does not work if it is null.

Any idea on how to add this column correctly?

Upvotes: 0

Views: 914

Answers (3)

fuglede
fuglede

Reputation: 18211

As the built-in bool produces True on a string exactly if it is non-empty, you can achieve this simply through

df['newColumn'] = df['INDUCING'].astype(bool).astype(int)

Some performance comparisons:

In [61]: df = pd.DataFrame({'INDUCING': ['test', None, '', 'more test']*10000})

In [63]: %timeit np.where(df['INDUCING'].fillna('') != "", 1, 0)
5.68 ms ± 500 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [62]: %timeit (df['INDUCING'].ne('') & df['INDUCING'].notnull()).astype(int)
5.1 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [64]: %timeit np.where(df['INDUCING'], 1, 0)
667 µs ± 25.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [65]: %timeit df['INDUCING'].astype(bool).astype(int)
655 µs ± 5.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [99]: %timeit df['INDUCING'].values.astype(bool).astype(int)
553 µs ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 0

jpp
jpp

Reputation: 164753

By De Morgan's laws, NOT(cond1 OR cond2) is equivalent to AND(NOT(cond1) AND NOT(cond2)).

You can combine conditions via the bitwise "and" (&) / "or" (|) operators as appropriate. This gives a Boolean series, which you can then cast to int:

df['newColumn'] = (df['INDUCING'].ne('') & df['INDUCING'].notnull()).astype(int)

Upvotes: 2

Anton vBR
Anton vBR

Reputation: 18916

Easiest way would be to .fillna('') first. Correction:

df['newColumn'] = np.where(df['INDUCING'].fillna('') != "", 1, 0)

or pass .astype(int) directly to the mask. This converts True to 1 and False to 0:

df['newcol'] = (df['INDUCING'].fillna('') != '').astype(int)

Upvotes: 1

Related Questions