Erratic NaN behaviour in numpy/pandas

Question

I've been trying to replace missing values in a Pandas dataframe, but without success. I tried the .fillna method and also tried to loop through the entire data set, checking each cell and replacing NaNs with a chosen value. However, in both cases, Python executes the script without throwing up any errors, but the NaN values remain.

When I dug a bit deeper, I discovered behaviour that seems erratic to me, best demonstrated with an example:

In[ ] X['Smokinginpregnancy'].head() 

Out[ ] 

Index
E09000002          NaN
E09000003     5.216126
E09000004    10.287496
E09000005     3.090379
E09000006     6.080041
Name: Smokinginpregnancy, dtype: float64

I know for a fact that the first item in this column is missing and pandas recognises it as NaN. In fact, if I call this item on its own, python tells me it's NaN:

In [ ] X['Smokinginpregnancy'][0]
Out [ ]
nan

However, when I test whether it's NaN, python returns False.

In [ ] X['Smokinginpregnancy'][0] == np.nan
Out [ ] False

I suspect that when .fillna is being executed, python checks whether the item is NaN but gets back a False, so it continues, leaving the cell alone.

Does anyone know what's going on? Any solutions? (apart from opening the csv file in excel and then manually replacing the values.)

I'm using Anaconda's Python 3 distribution.

Bakuriu · Accepted Answer

You are doing:

X['Smokinginpregnancy'][0] == np.nan

This is guaranteed to return False because all NaNs compare unequal to everything by IEEE754 standard:

>>> x = float('nan')
>>> x == x
False
>>> x == 1
False
>>> x == float('nan')
False

See also here. You have to use math.isnan to check for NaNs:

>>> math.isnan(x)
True

Or numpy.isnan

So use:

numpy.isnan(X['Smokinginpregnancy'][0])

Regarding pandas.fillna note that this function returns the filled array. Maybe you did something like:

X.fillna(...)

without reassigning X? Alternatively you must pass inplace=True to mutate the dataframe on which you are calling the method.

Erratic NaN behaviour in numpy/pandas

Answers (2)

Related Questions