Bindiya12
Bindiya12

Reputation: 3491

Unable to handle NaN in pandas dataframe

I have a pandas dataframe with a variable, which, when I print it, shows up as mostly containing NaN. It is of dtype object. However, when I run the isnull function, it returns "FALSE" everywhere. I am wondering why the NaN values are not encoded as missing, and if there is any way of converting them to missing values that are treated properly.

Thank you.

Upvotes: 2

Views: 1408

Answers (2)

David Armstrong
David Armstrong

Reputation: 45

Building on from piRSquared, a possible method to treating NaN values (if applicable to your problem) is to convert the NaN inputs to the median of the column.

df = df.fillna(df.mean())

Upvotes: -1

piRSquared
piRSquared

Reputation: 294198

Your NaN are strings

df = pd.DataFrame(dict(A=['Not NaN', 'NaN', np.nan]))
print(df)

         A
0  Not NaN
1      NaN
2      NaN

What's missing

print(df.isnull())

       A
0  False
1  False
2   True

The strings are not missing, the np.nan are.

You can mask the strings with

df.A.mask(df.A.eq('NaN')).isnull()

0    False
1     True
2     True
Name: A, dtype: bool

Upvotes: 2

Related Questions