pilz2985
pilz2985

Reputation: 257

Filter NaN values in a dataframe column

y = data.loc[data['column1'] != float('NaN'),'column1']

The code above is still returning rows with NaN values in 'column1'. Not sure what I'm doing wrong.. Please help!

Upvotes: 4

Views: 8549

Answers (1)

cs95
cs95

Reputation: 402483

NaN, by definition is not equal to NaN.

In [1262]: np.nan == np.nan
Out[1262]: False

Read up about the mathematical concept on Wikipedia.


Option 1

Using pd.Series.notnull:

df

   column1
0      1.0
1      2.0
2    345.0
3      NaN
4      4.0
5     10.0
6      NaN
7    100.0
8      NaN

y = df.loc[df.column1.notnull(), 'column1']
y

0      1.0
1      2.0
2    345.0
4      4.0
5     10.0
7    100.0
Name: column1, dtype: float64

Option 2

As MSeifert suggested, you could use np.isnan:

y = df.loc[~np.isnan(df.column1), 'column1']
y

0      1.0
1      2.0
2    345.0
4      4.0
5     10.0
7    100.0
Name: column1, dtype: float64

Option 3

If it's just the one column, call pd.Series.dropna:

y = df.column1.dropna()
y

0      1.0
1      2.0
2    345.0
4      4.0
5     10.0
7    100.0
Name: column1, dtype: float64

Upvotes: 3

Related Questions