Reputation: 43
Say I have a big DataFrame (>10000 rows) that has some rows containing one or more nulls. How do I remove all the rows containing a null in one or more of its columns from the original DataFrame and putting the rows into another DataFrame?
e.g.:
Original DataFrame:
a b c
1 "foo" 5 3
2 "bar" 9 1
3 NaN 5 4
4 "foo" NaN 1
Non-Null DataFrame:
a b c
1 "foo" 5 3
2 "bar" 9 1
Null containing DataFrame:
a b c
1 NaN 5 4
2 "foo" NaN 1
Upvotes: 1
Views: 66
Reputation: 863166
Use DataFrame.isna
for checking missing values:
print (df.isna())
#print (df.isnull())
a b c
1 False False False
2 False False False
3 True False False
4 False True False
And test if at least True
per row by DataFrame.any
:
mask = df.isna().any(axis=1)
#oldier pandas versions
mask = df.isnull().any(axis=1)
print (mask)
1 False
2 False
3 True
4 True
dtype: bool
Last filter by boolean indexing
- ~
is for inverting boolean mask:
df1 = df[~mask]
df2 = df[mask]
print (df1)
a b c
1 foo 5.0 3
2 bar 9.0 1
print (df2)
a b c
3 NaN 5.0 4
4 foo NaN 1
Upvotes: 2