Reputation: 1598
I have a dataframe that looks as follows (sample below for reference, original has many more columns):
I am trying to get a list of rows where all columns are null (NaN) except for some specific columns. For example, if those specific columns are col2 and col3, I would get the first and third rows. If the specific columns are just col1, I would get only the last row.
Having just a count of the rows that meet those criteria would work too.
I know how to do this by going through each row and comparing, but is there a quicker way to do this?
Thanks!
Upvotes: 0
Views: 1308
Reputation: 8790
Here is a function for doing so. I used difference
to find get the dataframe excluding the specified columns, then use isna()
and all()
to find the empty rows:
def null_rows(df, exclude=None):
exclude = [] if exclude is None else exclude
return df[df[df.columns.difference(exclude)].isna().all(1)]
Example:
df = pd.DataFrame({'col1': [None, 3, None, 8],
'col2': [1, None, 6, None],
'col3': [2, 4, 7, None],
'col4': [None, None, None, None],
'col5': [None, 5, None, None]})
print(null_rows(df, ['col2', 'col3']))
Output:
col1 col2 col3 col4 col5
0 NaN 1.0 2.0 None NaN
2 NaN 6.0 7.0 None NaN
Upvotes: 1
Reputation: 150745
You can try:
# specific columns
cols = ['col1','col2']
df[df.drop(cols, axis=1).isna().all(1)]
That would not check if you have data in cols
. If you require that, you can do:
other_nan = df.drop(cols, axis=1).isna().all(1)
chosen_notna = df[cols].notna().any(1)
df[other_nan & chosen_notna]
Upvotes: 2