Veki
Veki

Reputation: 551

Conditional filtering on dataframe with multiple columns

I have a dataframe with 80 columns. Out of those, there are some 45 columns that I need to check for "all-zero" value filter. If for a row, all those 45 columns have zero as their value, then that row is marked as True. Otherwise as False.

Here's a sample dataframe for this problem:

df  = pd.DataFrame({'col1': [101,102,103,104,105,106,107,108,109,110,111],
                'col2': ['A','B','A','A','A','B','B','A','A','B','B'],
                'col3': [12e10,23e10,34e10,0,56e10,67e10,78e10,89e10,0,12e10,23e10],
                'col4': ['F','F','F','E','E','E','E','E','F','F','F'],
                'col5': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col6': [12e10,0,34e10,0,0,67e10,78e10,0,0,12e10,23e10],
                'col7': [12e10,0,34e10,45e10,0,67e10,0,0,0,12e10,23e10],
                'col8': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col9': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col10': [12e10,0,0,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col11': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col12': [12e10,0,34e10,0,0,67e10,78e10,0,0,12e10,23e10],
                'col13': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col14': [12e10,0,34e10,45e10,0,67e10,0,0,0,12e10,23e10],
                'col15': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col16': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,0],
                'col17': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,0,0],
                'col18': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col19': [12e10,0,34e10,0,0,67e10,78e10,0,0,12e10,23e10],
                'col20': [12e10,0,0,45e10,0,67e10,78e10,0,0,12e10,23e10],
               })

Now, in this example, I need to check the condition on all columns from col5 to col19. col3 and col20 are excluded. And all other categorical columns are excluded too.

I did this to filter out those rows.

df[(df[col5] == 0) & (df[col6] == 0) & (df[col7] == 0) & ... & (df[col19] == 0)]

This will get me rows (col1: 102,105,108,109), as they have all col5 to col19 as zero. Now I need to put True in a new column df['mark'] for these rows, and False for the rest of the rows.

But I believe there has to be some easy way to do this?

Upvotes: 1

Views: 860

Answers (2)

Veki
Veki

Reputation: 551

I tried another way to do this using sum() and map() method to get the requisite values.

df['mark'] = np.sign(df.loc[:,'col5':'col19'].sum(axis=1)).map({-1:False,1:False,0:True})

The solution submitted by Mustafa is much faster and better approach in my case. However, this will help if instead of boolean, you want a specific indicator for negative, positive, and zero values.

Upvotes: 1

Mustafa Aydın
Mustafa Aydın

Reputation: 18315

You can use loc indexing and then check if all entries of rows (axis=1) equals to 0 (eq(0)):

df["mark"] = df.loc[:, "col5": "col19"].eq(0).all(axis=1)

to get

      col1 col2        col3 ...        col19         col20    mark
0    101    A  1.200000e+11 ...  1.200000e+11  1.200000e+11  False
1    102    B  2.300000e+11 ...  0.000000e+00  0.000000e+00   True
2    103    A  3.400000e+11 ...  3.400000e+11  0.000000e+00  False
3    104    A  0.000000e+00 ...  0.000000e+00  4.500000e+11  False
4    105    A  5.600000e+11 ...  0.000000e+00  0.000000e+00   True
5    106    B  6.700000e+11 ...  6.700000e+11  6.700000e+11  False
6    107    B  7.800000e+11 ...  7.800000e+11  7.800000e+11  False
7    108    A  8.900000e+11 ...  0.000000e+00  0.000000e+00   True
8    109    A  0.000000e+00 ...  0.000000e+00  0.000000e+00   True
9    110    B  1.200000e+11 ...  1.200000e+11  1.200000e+11  False
10   111    B  2.300000e+11 ...  2.300000e+11  2.300000e+11  False

If the columns are not consecutive, you can write their names beforehand and pass it above e.g.:

cols_to_look_for = ["col8", "col13", "col21", "col34"]
df["mark"] = df.loc[:, cols_to_look_for].eq(0).all(axis=1)

(In the example you gave in the question, their consecutiveness allowed the slice notation.)

Upvotes: 1

Related Questions