Japie
Japie

Reputation: 43

How to separate null and non-null containing rows into two different DataFrames?

Say I have a big DataFrame (>10000 rows) that has some rows containing one or more nulls. How do I remove all the rows containing a null in one or more of its columns from the original DataFrame and putting the rows into another DataFrame?

e.g.:

Original DataFrame:

         a    b    c
1    "foo"    5    3
2    "bar"    9    1
3      NaN    5    4
4    "foo"  NaN    1

Non-Null DataFrame:

         a    b    c
1    "foo"    5    3
2    "bar"    9    1

Null containing DataFrame:

         a    b    c
1      NaN    5    4
2    "foo"  NaN    1

Upvotes: 1

Views: 66

Answers (1)

jezrael
jezrael

Reputation: 863166

Use DataFrame.isna for checking missing values:

print (df.isna())
#print (df.isnull())
       a      b      c
1  False  False  False
2  False  False  False
3   True  False  False
4  False   True  False

And test if at least True per row by DataFrame.any:

mask = df.isna().any(axis=1)
#oldier pandas versions
mask = df.isnull().any(axis=1)
print (mask)
1    False
2    False
3     True
4     True
dtype: bool

Last filter by boolean indexing - ~ is for inverting boolean mask:

df1 = df[~mask]
df2 = df[mask]

print (df1)
     a    b  c
1  foo  5.0  3
2  bar  9.0  1

print (df2)
     a    b  c
3  NaN  5.0  4
4  foo  NaN  1

Upvotes: 2

Related Questions