Vincent Teyssier
Vincent Teyssier

Reputation: 2217

isnull(df.any) is returning False despite NaN in the DataFrame

I am trying to get if a Dataframe contains Null or NaN values the following way:

import numpy as np
import pandas as pd

# load the time series of known points
y = [np.array([])]

y = [np.array([10, 11, 11, 11, 2, 4, 3, 7, 8, 9]),  
     np.array([1, 1, 1, 2, 2, 2, 2, 3, 2, 1]), 
     np.array([1, 1, 2, 2, 2, 2, 3, 2, 1])]
y = pd.DataFrame(y)



for i in range(len(y)):
    if (pd.isnull(y.any) == True):
        print ("Error: t2 array index " + str(i) + " ahave NaN or null!")
print("All good bro")
print (pd.isnull(y)
print (pd.isnull(y.any))

When I print y you can clearly see that the last element is NaN. This is due to the third numpy array being shorter than the others. Pandas automatically fill the missing value with NaN to keep the dataframe shape.

However if I try to print pd.isnull(y.any) I get False. I tried y.all instead and get also False. I also tried y[1].any with the same outcome.

What am I missing here ?

Upvotes: 1

Views: 1200

Answers (1)

Nihal
Nihal

Reputation: 5334

you can use:

y.isnull().values.any()

code:

import numpy as np
import pandas as pd

# load the time series of known points
y = [np.array([])]

y = [np.array([10, 11, 11, 11, 2, 4, 3, 7, 8, 9]),  
     np.array([1, 1, 1, 2, 2, 2, 2, 3, 2, 1]), 
     np.array([1, 1, 2, 2, 2, 2, 3, 2, 1])]
y = pd.DataFrame(y)

print(y.isnull().values.any())

output:

True

Upvotes: 2

Related Questions