Reputation: 2035
I have a pandas dataframe (df), and I want to do something like:
newdf = df[(df.var1 == 'a') & (df.var2 == NaN)]
I've tried replacing NaN with np.NaN
, or 'NaN'
or 'nan'
etc, but nothing evaluates to True. There's no pd.NaN
.
I can use df.fillna(np.nan)
before evaluating the above expression but that feels hackish and I wonder if it will interfere with other pandas operations that rely on being able to identify pandas-format NaN's later.
I get the feeling there should be an easy answer to this question, but somehow it has eluded me.
Upvotes: 145
Views: 320085
Reputation: 14003
filtered_df = df[df['var2'].isna()]
This filters and gives you rows which has only NaN
values in 'var2'
column.
Note: "Series.isnull is an alias for Series.isna."
Upvotes: 172
Reputation: 2310
You can also use query
here:
df.query('var2 != var2')
This works since np.nan != np.nan
.
Upvotes: 2
Reputation: 2528
This doesn't work because NaN
isn't equal to anything, including NaN
. Use pd.isnull(df.var2)
instead.
Upvotes: 129
Reputation: 4673
Pandas uses numpy
's NaN value. Use numpy.isnan
to obtain a Boolean vector from a pandas series.
Upvotes: 10