Reputation: 15718
I would like to create a DataFrame with booleans where np.nan == False and any positive real value == True.
import numpy as np
import pandas as pd
DF = pd.DataFrame({'a':[1,2,3,4,np.nan],'b':[np.nan,np.nan,np.nan,5,np.nan]})
DF.apply(bool) # Does not work
DF.where(DF.isnull() == False) # Does not work
DF[DF.isnull() == False] # Does not work
Upvotes: 3
Views: 1996
Reputation: 575
Comparing notnull() and isnan() on a df with some malformatting:
df = pd.DataFrame({'a':[1,2,3,4,np.nan],'b':[np.nan,np.nan,np.nan,5,np.nan],'c':['fish','bear','cat','dog',np.nan]})
%%timeit
legit_dexes = np.isnan(df[df<=""].astype(float)) == False
1000 loops, best of 3: 632 us per loop
%%timeit
legit_dexes = pd.notnull(df)
1000 loops, best of 3: 751 us per loop
This variation that ignores malformed columns is also similar:
%%timeit
legit_dexes = np.isnan(df[df.columns[df.apply(lambda x: not np.any(x.values>=""))]]) == False
1000 loops, best of 3: 681 us per loop
Upvotes: 0
Reputation: 80386
Weird, but it looks like - np.isnan(df)
outperforms pd.notnull(df)
by a landslide:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame({'a':[1,2,3,4,np.nan],'b':[np.nan,np.nan,np.nan,5,np.nan]})
In [4]: - np.isnan(df)
Out[4]:
a b
0 True False
1 True False
2 True False
3 True True
4 False False
In [5]: %timeit - np.isnan(df)
10000 loops, best of 3: 159 us per loop
In [6]: %timeit pd.notnull(df)
1000 loops, best of 3: 1.22 ms per loop
Upvotes: 2
Reputation: 375685
There's a convenience function for not isnull
, called notnull
:
In [11]: pd.notnull(df)
Out[11]:
a b
0 True False
1 True False
2 True False
3 True True
4 False False
Upvotes: 2