user6315578
user6315578

Reputation:

Error: float object has no attribute notnull

I have a dataframe:

  a     b     c
0 nan   Y     nan
1  23   N      3
2 nan   N      2
3  44   Y     nan

I wish to have this output:

  a     b     c      d
0 nan   Y     nan   nan
1  23   N      3     96
2 nan   N      2    nan
3  44   Y     nan    44

I wish to have a condition which is when column a is null, then d will be null else if column b is N and column c is not null then column d is equal to column a * column c else column d equal column a

I have done this code but i get the error:

def f4(row):
    if row['a']==np.nan:
       return np.nan
    elif row['b']=="N" & row(row['c'].notnull()):
       return row['a']*row['c']
    else:
       return row['a']

 DF['P1']=DF.apply(f4,axis=1)

can anyone help me point out where is my mistake? I have refer to this and try this but also get the error Creating a new column based on if-elif-else condition

Upvotes: 30

Views: 70213

Answers (5)

Max Kleiner
Max Kleiner

Reputation: 1612

Use

pd.isnull(df['Description'][i])

or

pd.isna(df['Description'][i])

Upvotes: 27

JiangKui
JiangKui

Reputation: 1347

Using pd.isnull() instead of == np.nan.

Example:

>>> x1 = np.nan
>>> x1 == np.nan
False
>>> pd.isnull(x1)
True
>>> pd.isna(x1)
True

Look this:

The difference between comparison to np.nan and isnull()

Upvotes: 1

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95872

Since you just want Nans to be propagated, multiplying the columns takes care of that for you:

>>> df = pd.read_clipboard()
>>> df
      a  b    c
0   NaN  Y  NaN
1  23.0  N  3.0
2   NaN  N  2.0
3  44.0  Y  NaN
>>> df.a * df.c
0     NaN
1    69.0
2     NaN
3     NaN
dtype: float64
>>>

If you want to do it on a condition, you can use np.where here instead of .apply. all you need is the following:

>>> df
      a  b    c
0   NaN  Y  NaN
1  23.0  N  3.0
2   NaN  N  2.0
3  44.0  Y  NaN
>>> np.where(df.b == 'N', df.a*df.c, df.a)
array([ nan,  69.,  nan,  44.])

This is the default behavior for most operations involving Nan. So, you can simply assign the result of the above:

>>> df['d'] = np.where(df.b == 'N', df.a*df.c, df.a)
>>> df
      a  b    c     d
0   NaN  Y  NaN   NaN
1  23.0  N  3.0  69.0
2   NaN  N  2.0   NaN
3  44.0  Y  NaN  44.0
>>>

Just to elaborate on what this:

np.where(df.b == 'N', df.a*df.c, df.a)

Is doing, you can think of it as "where df.b == 'N', give me the result of df.a * df.c, else, give me just df.a:

>>> np.where(df.b == 'N', df.a*df.c, df.a)
array([ nan,  69.,  nan,  44.])

Also note, if your dataframe were a little different:

>>> df
      a  b    c
0   NaN  Y  NaN
1  23.0  Y  3.0
2   NaN  N  2.0
3  44.0  Y  NaN
>>> df.loc[0,'a'] = 99
>>> df.loc[0, 'b']= 'N'
>>> df
      a  b    c
0  99.0  N  NaN
1  23.0  N  3.0
2   NaN  N  2.0
3  44.0  Y  NaN

Then the following would not be equivalent:

>>> np.where(df.b == 'N', df.a*df.c, df.a)
array([ nan,  69.,  nan,  44.])
>>> np.where((df.b == 'N') & (~df.c.isnull()), df.a*df.c, df.a)
array([ 99.,  69.,  nan,  44.])

So you might want to use the slightly more verbose:

>>> df['d'] = np.where((df.b == 'N') & (~df.c.isnull()), df.a*df.c, df.a)
>>> df
      a  b    c     d
0  99.0  N  NaN  99.0
1  23.0  N  3.0  69.0
2   NaN  N  2.0   NaN
3  44.0  Y  NaN  44.0
>>>

Upvotes: 7

Vaishali
Vaishali

Reputation: 38415

You can try

df['d'] = np.where((df.b == 'N') & (pd.notnull(df.c)), df.a*df.c, np.where(pd.notnull(df.a), df.a, np.nan))


    a       b   c      d
0   NaN     Y   NaN    NaN
1   23.0    N   3.0    69.0
2   NaN     N   2.0    NaN
3   44.0    Y   NaN    44.0

See the documentation for pandas notnull, in your current code, you just need to change series.notnull to pd.notnull(series) for it to work. Though np.where should be more efficient

def f4(row):
    if row['a']==np.nan:
        return np.nan
    elif (row['b']=="N") & (pd.notnull(row.c)):
        return row['a']*row['c']
    else:
        return row['a']
df['d']=df.apply(f4,axis=1)

Upvotes: 6

Scott Boston
Scott Boston

Reputation: 153460

You don't need apply, use np.where:

df['d'] = np.where(df.a.isnull(),
         np.nan,
         np.where((df.b == "N")&(~df.c.isnull()),
                  df.a*df.c,
                  df.a))

Output:

      a  b    c     d
0   NaN  Y  NaN   NaN
1  23.0  N  3.0  69.0
2   NaN  N  2.0   NaN
3  44.0  Y  NaN  44.0

Upvotes: 12

Related Questions