Reputation:
I have a dataframe:
a b c
0 nan Y nan
1 23 N 3
2 nan N 2
3 44 Y nan
I wish to have this output:
a b c d
0 nan Y nan nan
1 23 N 3 96
2 nan N 2 nan
3 44 Y nan 44
I wish to have a condition which is when column a is null, then d will be null else if column b is N and column c is not null then column d is equal to column a * column c else column d equal column a
I have done this code but i get the error:
def f4(row):
if row['a']==np.nan:
return np.nan
elif row['b']=="N" & row(row['c'].notnull()):
return row['a']*row['c']
else:
return row['a']
DF['P1']=DF.apply(f4,axis=1)
can anyone help me point out where is my mistake? I have refer to this and try this but also get the error Creating a new column based on if-elif-else condition
Upvotes: 30
Views: 70213
Reputation: 1612
Use
pd.isnull(df['Description'][i])
or
pd.isna(df['Description'][i])
Upvotes: 27
Reputation: 1347
Using pd.isnull()
instead of == np.nan
.
Example:
>>> x1 = np.nan
>>> x1 == np.nan
False
>>> pd.isnull(x1)
True
>>> pd.isna(x1)
True
Look this:
The difference between comparison to np.nan and isnull()
Upvotes: 1
Reputation: 95872
Since you just want Nan
s to be propagated, multiplying the columns takes care of that for you:
>>> df = pd.read_clipboard()
>>> df
a b c
0 NaN Y NaN
1 23.0 N 3.0
2 NaN N 2.0
3 44.0 Y NaN
>>> df.a * df.c
0 NaN
1 69.0
2 NaN
3 NaN
dtype: float64
>>>
If you want to do it on a condition, you can use np.where
here instead of .apply
. all you need is the following:
>>> df
a b c
0 NaN Y NaN
1 23.0 N 3.0
2 NaN N 2.0
3 44.0 Y NaN
>>> np.where(df.b == 'N', df.a*df.c, df.a)
array([ nan, 69., nan, 44.])
This is the default behavior for most operations involving Nan
. So, you can simply assign the result of the above:
>>> df['d'] = np.where(df.b == 'N', df.a*df.c, df.a)
>>> df
a b c d
0 NaN Y NaN NaN
1 23.0 N 3.0 69.0
2 NaN N 2.0 NaN
3 44.0 Y NaN 44.0
>>>
Just to elaborate on what this:
np.where(df.b == 'N', df.a*df.c, df.a)
Is doing, you can think of it as "where df.b == 'N', give me the result of df.a * df.c
, else, give me just df.a
:
>>> np.where(df.b == 'N', df.a*df.c, df.a)
array([ nan, 69., nan, 44.])
Also note, if your dataframe were a little different:
>>> df
a b c
0 NaN Y NaN
1 23.0 Y 3.0
2 NaN N 2.0
3 44.0 Y NaN
>>> df.loc[0,'a'] = 99
>>> df.loc[0, 'b']= 'N'
>>> df
a b c
0 99.0 N NaN
1 23.0 N 3.0
2 NaN N 2.0
3 44.0 Y NaN
Then the following would not be equivalent:
>>> np.where(df.b == 'N', df.a*df.c, df.a)
array([ nan, 69., nan, 44.])
>>> np.where((df.b == 'N') & (~df.c.isnull()), df.a*df.c, df.a)
array([ 99., 69., nan, 44.])
So you might want to use the slightly more verbose:
>>> df['d'] = np.where((df.b == 'N') & (~df.c.isnull()), df.a*df.c, df.a)
>>> df
a b c d
0 99.0 N NaN 99.0
1 23.0 N 3.0 69.0
2 NaN N 2.0 NaN
3 44.0 Y NaN 44.0
>>>
Upvotes: 7
Reputation: 38415
You can try
df['d'] = np.where((df.b == 'N') & (pd.notnull(df.c)), df.a*df.c, np.where(pd.notnull(df.a), df.a, np.nan))
a b c d
0 NaN Y NaN NaN
1 23.0 N 3.0 69.0
2 NaN N 2.0 NaN
3 44.0 Y NaN 44.0
See the documentation for pandas notnull, in your current code, you just need to change series.notnull to pd.notnull(series) for it to work. Though np.where should be more efficient
def f4(row):
if row['a']==np.nan:
return np.nan
elif (row['b']=="N") & (pd.notnull(row.c)):
return row['a']*row['c']
else:
return row['a']
df['d']=df.apply(f4,axis=1)
Upvotes: 6
Reputation: 153460
You don't need apply
, use np.where
:
df['d'] = np.where(df.a.isnull(),
np.nan,
np.where((df.b == "N")&(~df.c.isnull()),
df.a*df.c,
df.a))
Output:
a b c d
0 NaN Y NaN NaN
1 23.0 N 3.0 69.0
2 NaN N 2.0 NaN
3 44.0 Y NaN 44.0
Upvotes: 12