Reputation: 635
I have a dataframe that might contain NaN values.
array = np.empty((4,5))
array[:] = 10
df = pd.DataFrame(array)
df.iloc[1,3] = np.NaN
df.isna().apply(lambda x: any(x), axis = 0)
Output:
0 False
1 False
2 False
3 True
4 False
dtype: bool
When I run:
any(df.isna())
It returns:
True
If there are no NaNs:
array = np.empty((4,5))
array[:] = 10
df = pd.DataFrame(array)
#df.iloc[1,3] = np.NaN
df.isna().apply(lambda x: any(x), axis = 0)
0 False
1 False
2 False
3 False
4 False
dtype: bool
However when I run:
any(df.isna())
It returns:
True
Why this is the case? Do I have any misunderstanding of the function any()?
Upvotes: 1
Views: 72
Reputation: 5433
Why this is the case? Do I have any misunderstanding of the function any()?
When you loop over a DataFrame you are actually iterating over its column labels, not its rows or values as you might think. More precisely, the for loop calls Dataframe.__iter__
which returns an iterator over the column labels of the DataFrame.
For instance, in the following
df = pd.DataFrame(columns=['a', 'b', 'c'])
for x in df:
print(x)
# Output:
#
# a
# b
# c
x
holds the name of each df
column. You can also see what is the output of list(df)
.
This means that when you do any(df.isna())
, under the hood any
is actually iterating over the column labels of df
and checking their truthiness. If at least one is truthy it returns True
.
In both of your examples the column labels are numbers list(df.isna()) = list(df.columns) = [0, 1, 2, 3]
, from which only 0 is a Falsy value. Therefore, in both cases any(df.isna()) = True
.
Solution
The solution is to use DataFrame.any
with axis=None
instead of using the built-in any
function.
df.isna().any(axis=None)
Upvotes: 1