GNUk
GNUk

Reputation: 13

How to create a seperate dataframe of rows that contain NaN with pandas

Is it possible to create a new dataFrame using pandas that contains any row that has NaN in any column from an existing datafram to be reviewed by a person?

I'm able to get rows that contain NaN in a specific column with: df_nan = df[pd.isna(df["sales_person"])]

but is there a way to do this without having to list out each column name?

Upvotes: 0

Views: 490

Answers (3)

Balaji Ambresh
Balaji Ambresh

Reputation: 5012

Here you go:

df_nan = df[df.apply(lambda row: any(pd.isna(row)), axis=1)]

Upvotes: 0

Tom
Tom

Reputation: 8800

This works:

>>> import pandas as pd
>>> import numpy as np
>>>
>>> #making fake data, we want rows 0 and 1 to be output
>>> df = pd.DataFrame()
>>> df['Col1'] = [1,np.nan,4,1,10]
>>> df['Col2'] = [np.nan,3,4,5,1]
>>>
>>> nas = df.apply(pd.isnull, axis=1) #boolean dataframe showing np.nan positions
>>> any_nas = nas.any(axis=1)         #rows where there are any True values
>>> only_nas = df[any_nas].copy()     #original dataframe sliced by the rows with True values 
>>>
>>> print(df.head());
   Col1  Col2
0   1.0   NaN
1   NaN   3.0
2   4.0   4.0
3   1.0   5.0
4  10.0   1.0

>>> print(only_nas.head())
   Col1  Col2
0   1.0   NaN
1   NaN   3.0

Upvotes: 0

TayTay
TayTay

Reputation: 7170

Since you didn't include your data, I created some as an example:

import pandas as pd
import numpy as np

X = np.random.RandomState(1).rand(10, 5)
X[X < 0.1] = np.nan

df = pd.DataFrame.from_records(X)
print(df)
#           0         1         2         3         4
# 0  0.417022  0.720324       NaN  0.302333  0.146756
# 1       NaN  0.186260  0.345561  0.396767  0.538817
# 2  0.419195  0.685220  0.204452  0.878117       NaN
# 3  0.670468  0.417305  0.558690  0.140387  0.198101
# 4  0.800745  0.968262  0.313424  0.692323  0.876389
# 5  0.894607       NaN       NaN  0.169830  0.878143
# 6       NaN  0.421108  0.957890  0.533165  0.691877
# 7  0.315516  0.686501  0.834626       NaN  0.750144
# 8  0.988861  0.748166  0.280444  0.789279  0.103226
# 9  0.447894  0.908596  0.293614  0.287775  0.130029

With that data, it's pretty straight forward to find any NaNs on a row:

nan_mask = pd.isnull(df).any(axis=1)
df[nan_mask]
#           0         1         2         3         4
# 0  0.417022  0.720324       NaN  0.302333  0.146756
# 1       NaN  0.186260  0.345561  0.396767  0.538817
# 2  0.419195  0.685220  0.204452  0.878117       NaN
# 5  0.894607       NaN       NaN  0.169830  0.878143
# 6       NaN  0.421108  0.957890  0.533165  0.691877
# 7  0.315516  0.686501  0.834626       NaN  0.750144

Now, only rows with any NaNs in them appear.

Upvotes: 3

Related Questions