Delete dataframe rows based upon two dependent conditions

Question

I have a fairly large dataframe (a few hundred columns) and I want to perform the following operation on it. I am using a toy dataframe below with a simple condition to illustrate what I need.

For every row: Condition #1: Check two of the columns for a value of zero (0). If this is true, keep the row and move on to the next. If either column has a value of zero (0), the condition is True.

If Condition #1 is False (no zeros in either column 1 or 4) Check all remaining columns in the row. If any of the remaining columns has a value of zero, drop the row.

I would like the filtered dataframe returned as a new, separate dataframe.

My code so far:

# https://codereview.stackexchange.com/questions/185389/dropping-rows-from-a-pandas-dataframe-where-some-of-the-columns-have-value-0/185390
# https://thispointer.com/python-pandas-how-to-drop-rows-in-dataframe-by-conditions-on-column-values/
# https://stackoverflow.com/questions/29763620/how-to-select-all-columns-except-one-column-in-pandas

import pandas as pd

df = pd.DataFrame({'Col1': [7, 6, 0, 1, 8],
                   'Col2': [0.5, 0.5, 0, 0, 7],
                   'Col3': [0, 0, 3, 3, 6],
                   'Col4': [7, 0, 6, 4, 5]})

print(df)
print()

exclude = ['Col1', 'Col4']
all_but_1_and_4 = df[df.columns.difference(exclude)]        # Filter out columns 1 and 4
print(all_but_1_and_4)
print()


def delete_rows(row):
    if row['Col1'] == 0 or row['Col4'] == 0:    # Is the value in either Col1 or Col4 zero(0)
        skip = True                             # If it is, keep the row
        if not skip:                            # If not, check the second condition
            is_zero = all_but_1_and_4.apply(lambda x: 0 in x.values, axis=1).any()      # Are any values in the remaining columns zero(0)
            if is_zero:                         # If any of the remaining columns has a value of zero(0)
                pass
                # drop the row being analyzed   # Drop the row.


new_df = df.apply(delete_rows, axis=1)
print(new_df)

I don't know how to actually drop the row if both of my conditions are met.

In my toy dataframe, rows 1, 2 and 4 should be kept, 0 and 3 dropped.

I do not want to manually check all columns for step 2 because there are several hundred. That is why I filtered using .difference().

BENY · Accepted Answer

What I will do

s1=df[exclude].eq(0).any(1)
s2=df[df.columns.difference(exclude)].eq(0).any(1)

~(~s1&s2) #s1 | ~s2
Out[97]: 
0    False
1     True
2     True
3    False
4     True
dtype: bool
yourdf=df[s1 | ~s2].copy()

Delete dataframe rows based upon two dependent conditions

Answers (2)

Related Questions