Reputation: 8628
Given this dataframe, how to select only those rows that have "Col2" equal to NaN
?
df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)], columns=["Col1", "Col2", "Col3"])
which looks like:
0 1 2
0 0 1 2
1 0 NaN 0
2 0 0 NaN
3 0 1 2
4 0 1 2
The result should be this one:
0 1 2
1 0 NaN 0
Upvotes: 183
Views: 226968
Reputation: 21
Certainly, you may wish to consider this alternative option as well:
df[df["Col2"].isna()]
Upvotes: 1
Reputation:
If you want to select rows with at least one NaN value, then you could use isna
+ any
on axis=1
:
df[df.isna().any(axis=1)]
If you want to select rows with a certain number of NaN values, then you could use isna
+ sum
on axis=1
+ gt
. For example, the following will fetch rows with at least 2 NaN values:
df[df.isna().sum(axis=1)>1]
If you want to limit the check to specific columns, you could select them first, then check:
df[df[['Col1', 'Col2']].isna().any(axis=1)]
If you want to select rows with all NaN values, you could use isna
+ all
on axis=1
:
df[df.isna().all(axis=1)]
If you want to select rows with no NaN values, you could notna
+ all
on axis=1
:
df[df.notna().all(axis=1)]
This is equivalent to:
df[df['Col1'].notna() & df['Col2'].notna() & df['Col3'].notna()]
which could become tedious if there are many columns. Instead, you could use functools.reduce
to chain &
operators:
import functools, operator
df[functools.reduce(operator.and_, (df[i].notna() for i in df.columns))]
or numpy.logical_and.reduce
:
import numpy as np
df[np.logical_and.reduce([df[i].notna() for i in df.columns])]
If you're looking for filter the rows where there is no NaN in some column using query
, you could do so by using engine='python'
parameter:
df.query('Col2.notna()', engine='python')
or use the fact that NaN!=NaN
like @MaxU - stop WAR against UA
df.query('Col2==Col2')
Upvotes: 12
Reputation: 210852
@qbzenker provided the most idiomatic method IMO
Here are a few alternatives:
In [28]: df.query('Col2 != Col2') # Using the fact that: np.nan != np.nan
Out[28]:
Col1 Col2 Col3
1 0 NaN 0.0
In [29]: df[np.isnan(df.Col2)]
Out[29]:
Col1 Col2 Col3
1 0 NaN 0.0
Upvotes: 16