Reputation: 1287
Most efficient way of splitting the row which contains with and without NaN in pandas dataframe.
input :- ID Gender Dependants Income Education Married
1 Male 2 500 Graduate Yes
2 NaN 4 2500 Graduate No
3 Female 3 NaN NaN Yes
4 Male NaN 7000 Graduate Yes
5 Female 4 500 Graduate NaN
6 Female 2 4500 Graduate Yes
The expected output without NaN is,
ID Gender Dependants Income Education Married
1 Male 2 500 Graduate Yes
6 Female 2 4500 Graduate Yes
The expected output with NaN is,
ID Gender Dependants Income Education Married
2 NaN 4 2500 Graduate No
3 Female 3 NaN NaN Yes
4 Male NaN 7000 Graduate Yes
5 Female 4 500 Graduate NaN
Upvotes: 5
Views: 2163
Reputation: 863166
Use boolean indexing
with check missing values and any
for check at least one True
per rows:
mask = df.isnull().any(axis=1)
df1 = df[~mask]
df2 = df[mask]
print (df1)
ID Gender Dependants Income Education Married
0 1 Male 2.0 500.0 Graduate Yes
5 6 Female 2.0 4500.0 Graduate Yes
print (df2)
ID Gender Dependants Income Education Married
1 2 NaN 4.0 2500.0 Graduate No
2 3 Female 3.0 NaN NaN Yes
3 4 Male NaN 7000.0 Graduate Yes
4 5 Female 4.0 500.0 Graduate NaN
Details:
print (df.isnull())
ID Gender Dependants Income Education Married
0 False False False False False False
1 False True False False False False
2 False False False True True False
3 False False True False False False
4 False False False False False True
5 False False False False False False
print (mask)
0 False
1 True
2 True
3 True
4 True
5 False
dtype: bool
And you can always use a more readable way of the previous code where you don't need to invert the mask:
mask = df.notna().any(axis=1)
df1 = df[mask]
Same exact result.
Upvotes: 8