Reputation: 13
Is it possible to create a new dataFrame using pandas that contains any row that has NaN in any column from an existing datafram to be reviewed by a person?
I'm able to get rows that contain NaN in a specific column with:
df_nan = df[pd.isna(df["sales_person"])]
but is there a way to do this without having to list out each column name?
Upvotes: 0
Views: 490
Reputation: 5012
Here you go:
df_nan = df[df.apply(lambda row: any(pd.isna(row)), axis=1)]
Upvotes: 0
Reputation: 8800
This works:
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #making fake data, we want rows 0 and 1 to be output
>>> df = pd.DataFrame()
>>> df['Col1'] = [1,np.nan,4,1,10]
>>> df['Col2'] = [np.nan,3,4,5,1]
>>>
>>> nas = df.apply(pd.isnull, axis=1) #boolean dataframe showing np.nan positions
>>> any_nas = nas.any(axis=1) #rows where there are any True values
>>> only_nas = df[any_nas].copy() #original dataframe sliced by the rows with True values
>>>
>>> print(df.head());
Col1 Col2
0 1.0 NaN
1 NaN 3.0
2 4.0 4.0
3 1.0 5.0
4 10.0 1.0
>>> print(only_nas.head())
Col1 Col2
0 1.0 NaN
1 NaN 3.0
Upvotes: 0
Reputation: 7170
Since you didn't include your data, I created some as an example:
import pandas as pd
import numpy as np
X = np.random.RandomState(1).rand(10, 5)
X[X < 0.1] = np.nan
df = pd.DataFrame.from_records(X)
print(df)
# 0 1 2 3 4
# 0 0.417022 0.720324 NaN 0.302333 0.146756
# 1 NaN 0.186260 0.345561 0.396767 0.538817
# 2 0.419195 0.685220 0.204452 0.878117 NaN
# 3 0.670468 0.417305 0.558690 0.140387 0.198101
# 4 0.800745 0.968262 0.313424 0.692323 0.876389
# 5 0.894607 NaN NaN 0.169830 0.878143
# 6 NaN 0.421108 0.957890 0.533165 0.691877
# 7 0.315516 0.686501 0.834626 NaN 0.750144
# 8 0.988861 0.748166 0.280444 0.789279 0.103226
# 9 0.447894 0.908596 0.293614 0.287775 0.130029
With that data, it's pretty straight forward to find any NaNs on a row:
nan_mask = pd.isnull(df).any(axis=1)
df[nan_mask]
# 0 1 2 3 4
# 0 0.417022 0.720324 NaN 0.302333 0.146756
# 1 NaN 0.186260 0.345561 0.396767 0.538817
# 2 0.419195 0.685220 0.204452 0.878117 NaN
# 5 0.894607 NaN NaN 0.169830 0.878143
# 6 NaN 0.421108 0.957890 0.533165 0.691877
# 7 0.315516 0.686501 0.834626 NaN 0.750144
Now, only rows with any NaNs in them appear.
Upvotes: 3