Gerhard
Gerhard

Reputation: 2035

How to filter in NaN (pandas)?

I have a pandas dataframe (df), and I want to do something like:

newdf = df[(df.var1 == 'a') & (df.var2 == NaN)]

I've tried replacing NaN with np.NaN, or 'NaN' or 'nan' etc, but nothing evaluates to True. There's no pd.NaN.

I can use df.fillna(np.nan) before evaluating the above expression but that feels hackish and I wonder if it will interfere with other pandas operations that rely on being able to identify pandas-format NaN's later.

I get the feeling there should be an easy answer to this question, but somehow it has eluded me.

Upvotes: 145

Views: 320085

Answers (5)

Gil Baggio
Gil Baggio

Reputation: 14003

filtered_df = df[df['var2'].isna()]

This filters and gives you rows which has only NaN values in 'var2' column.

Note: "Series.isnull is an alias for Series.isna."

Upvotes: 172

Mohammad Shalaby
Mohammad Shalaby

Reputation: 299

df[df['var'].isna()]

where "var" is the column name

Upvotes: 29

rachwa
rachwa

Reputation: 2310

You can also use query here:

df.query('var2 != var2')

This works since np.nan != np.nan.

Upvotes: 2

Mark Whitfield
Mark Whitfield

Reputation: 2528

This doesn't work because NaN isn't equal to anything, including NaN. Use pd.isnull(df.var2) instead.

Upvotes: 129

NicholasM
NicholasM

Reputation: 4673

Pandas uses numpy's NaN value. Use numpy.isnan to obtain a Boolean vector from a pandas series.

Upvotes: 10

Related Questions