ShanZhengYang
ShanZhengYang

Reputation: 17651

How do you delete only specific rows in a Pandas DataFrame?

I have a pandas DataFrame which includes NaN values for rows

import pandas as pd
import numpy as np
df = pd.DataFrame(data)
df 

        one       two     three four   five
a  0.469112 -0.282863 -1.509059  bar   True
b       NaN  1.224234  7.823421  bar  False
c -1.135632  1.212112 -0.173215  bar  False
d       NaN       NaN       NaN  NaN   True
e  0.119209 -1.044236 -0.861849  bar   True
f -2.104569 -0.494929  1.071804  bar  False

I could drop all NaN values using df.dropna()

However, I only what to drop certain rows. For example, if there is a NaN is the one column, then that row should be dropped.

My solution is to create a new DataFrame

df[df.one != 'Nan']

How else could this be done?

Upvotes: 0

Views: 971

Answers (1)

EdChum
EdChum

Reputation: 394399

use loc and pass a boolean mask generated from notnull:

In [107]:
df.loc[df['one'].notnull()]

Out[107]:
        one       two     three four   five
a  0.469112 -0.282863 -1.509059  bar   True
c -1.135632  1.212112 -0.173215  bar  False
e  0.119209 -1.044236 -0.861849  bar   True
f -2.104569 -0.494929  1.071804  bar  False

the mask output:

In [109]:
df['one'].notnull()

Out[109]:
a     True
b    False
c     True
d    False
e     True
f     True
Name: one, dtype: bool

You can't compare NaN values using == or != as by design NaN == NaN is False

Upvotes: 2

Related Questions