shinhong
shinhong

Reputation: 446

Pandas does not convert NaN to None properly

Environment

What I'm trying to do

Replace all the NaN values in a dataframe with None

What I have now

In[6]: import pandas as pd
In[7]: import numpy as np
In[8]: df = pd.DataFrame({"a":[1,np.nan],"b":[np.nan,"foo"]})
In[9]: df
Out[9]: 
     a    b
0  1.0  NaN
1  NaN  foo


In[10]: pd.notnull(df)
Out[10]: 
       a      b
0   True  False
1  False   True


In[11]: df.where(pd.notnull(df), None)
Out[11]: 
     a     b
0  1.0  None
1  NaN   foo

Expected Output

In[11]: df.where(pd.notnull(df), None)
Out[11]: 
     a     b
0  1.0  None
1  None  foo

I have tested this on another machine with Python 3.8.5 and pandas==1.1.1, and it worked as expected. Is this a bug?

Thank you!

Upvotes: 1

Views: 1863

Answers (1)

user3483203
user3483203

Reputation: 51165

This is not a bug. In fact, the result you are seeing in pandas==1.1.1 is a bug, that has been fixed in later versions by PR39761.

The fix is also mentioned in the 1.3.0 release notes.

In general, pandas will try to cast to avoid results that contain object dtype columns, and this is no exception. If you would like to force the cast, you can use:

>>> df.astype(object).where(pd.notnull(df), None)
      a     b
0   1.0  None
1  None   foo

It seems as though there has been some grumbling in the community about this bug-fix, discussed here.

Upvotes: 3

Related Questions