Irina
Irina

Reputation: 1598

Pandas - Get count of rows where all values are null except for a set of columns

I have a dataframe that looks as follows (sample below for reference, original has many more columns):

sample table for dataframe

I am trying to get a list of rows where all columns are null (NaN) except for some specific columns. For example, if those specific columns are col2 and col3, I would get the first and third rows. If the specific columns are just col1, I would get only the last row.

Having just a count of the rows that meet those criteria would work too.

I know how to do this by going through each row and comparing, but is there a quicker way to do this?

Thanks!

Upvotes: 0

Views: 1308

Answers (2)

Tom
Tom

Reputation: 8790

Here is a function for doing so. I used difference to find get the dataframe excluding the specified columns, then use isna() and all() to find the empty rows:

def null_rows(df, exclude=None):
    exclude = [] if exclude is None else exclude
    return df[df[df.columns.difference(exclude)].isna().all(1)]

Example:

df = pd.DataFrame({'col1': [None, 3, None, 8],
                   'col2': [1, None, 6, None],
                   'col3': [2, 4, 7, None],
                   'col4': [None, None, None, None],
                   'col5': [None, 5, None, None]})

print(null_rows(df, ['col2', 'col3']))

Output:

   col1  col2  col3  col4  col5
0   NaN   1.0   2.0  None   NaN
2   NaN   6.0   7.0  None   NaN

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150745

You can try:

# specific columns
cols = ['col1','col2']

df[df.drop(cols, axis=1).isna().all(1)]

That would not check if you have data in cols. If you require that, you can do:

other_nan = df.drop(cols, axis=1).isna().all(1)
chosen_notna = df[cols].notna().any(1)

df[other_nan & chosen_notna]

Upvotes: 2

Related Questions