Reputation: 3491
I have a pandas dataframe with a variable, which, when I print it, shows up as mostly containing NaN. It is of dtype object. However, when I run the isnull function, it returns "FALSE" everywhere. I am wondering why the NaN values are not encoded as missing, and if there is any way of converting them to missing values that are treated properly.
Thank you.
Upvotes: 2
Views: 1408
Reputation: 45
Building on from piRSquared, a possible method to treating NaN values (if applicable to your problem) is to convert the NaN inputs to the median of the column.
df = df.fillna(df.mean())
Upvotes: -1
Reputation: 294198
Your NaN
are strings
df = pd.DataFrame(dict(A=['Not NaN', 'NaN', np.nan]))
print(df)
A
0 Not NaN
1 NaN
2 NaN
What's missing
print(df.isnull())
A
0 False
1 False
2 True
The strings are not missing, the np.nan
are.
You can mask
the strings with
df.A.mask(df.A.eq('NaN')).isnull()
0 False
1 True
2 True
Name: A, dtype: bool
Upvotes: 2