Reputation: 551
I have a dataframe with 80 columns. Out of those, there are some 45 columns that I need to check for "all-zero" value filter. If for a row, all those 45 columns have zero as their value, then that row is marked as True. Otherwise as False.
Here's a sample dataframe for this problem:
df = pd.DataFrame({'col1': [101,102,103,104,105,106,107,108,109,110,111],
'col2': ['A','B','A','A','A','B','B','A','A','B','B'],
'col3': [12e10,23e10,34e10,0,56e10,67e10,78e10,89e10,0,12e10,23e10],
'col4': ['F','F','F','E','E','E','E','E','F','F','F'],
'col5': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
'col6': [12e10,0,34e10,0,0,67e10,78e10,0,0,12e10,23e10],
'col7': [12e10,0,34e10,45e10,0,67e10,0,0,0,12e10,23e10],
'col8': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
'col9': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
'col10': [12e10,0,0,45e10,0,67e10,78e10,0,0,12e10,23e10],
'col11': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
'col12': [12e10,0,34e10,0,0,67e10,78e10,0,0,12e10,23e10],
'col13': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
'col14': [12e10,0,34e10,45e10,0,67e10,0,0,0,12e10,23e10],
'col15': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
'col16': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,0],
'col17': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,0,0],
'col18': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
'col19': [12e10,0,34e10,0,0,67e10,78e10,0,0,12e10,23e10],
'col20': [12e10,0,0,45e10,0,67e10,78e10,0,0,12e10,23e10],
})
Now, in this example, I need to check the condition on all columns from col5 to col19. col3 and col20 are excluded. And all other categorical columns are excluded too.
I did this to filter out those rows.
df[(df[col5] == 0) & (df[col6] == 0) & (df[col7] == 0) & ... & (df[col19] == 0)]
This will get me rows (col1: 102,105,108,109), as they have all col5 to col19 as zero. Now I need to put True in a new column df['mark'] for these rows, and False for the rest of the rows.
But I believe there has to be some easy way to do this?
Upvotes: 1
Views: 860
Reputation: 551
I tried another way to do this using sum() and map() method to get the requisite values.
df['mark'] = np.sign(df.loc[:,'col5':'col19'].sum(axis=1)).map({-1:False,1:False,0:True})
The solution submitted by Mustafa is much faster and better approach in my case. However, this will help if instead of boolean, you want a specific indicator for negative, positive, and zero values.
Upvotes: 1
Reputation: 18315
You can use loc
indexing and then check if all
entries of rows (axis=1
) equals to 0 (eq(0)
):
df["mark"] = df.loc[:, "col5": "col19"].eq(0).all(axis=1)
to get
col1 col2 col3 ... col19 col20 mark
0 101 A 1.200000e+11 ... 1.200000e+11 1.200000e+11 False
1 102 B 2.300000e+11 ... 0.000000e+00 0.000000e+00 True
2 103 A 3.400000e+11 ... 3.400000e+11 0.000000e+00 False
3 104 A 0.000000e+00 ... 0.000000e+00 4.500000e+11 False
4 105 A 5.600000e+11 ... 0.000000e+00 0.000000e+00 True
5 106 B 6.700000e+11 ... 6.700000e+11 6.700000e+11 False
6 107 B 7.800000e+11 ... 7.800000e+11 7.800000e+11 False
7 108 A 8.900000e+11 ... 0.000000e+00 0.000000e+00 True
8 109 A 0.000000e+00 ... 0.000000e+00 0.000000e+00 True
9 110 B 1.200000e+11 ... 1.200000e+11 1.200000e+11 False
10 111 B 2.300000e+11 ... 2.300000e+11 2.300000e+11 False
If the columns are not consecutive, you can write their names beforehand and pass it above e.g.:
cols_to_look_for = ["col8", "col13", "col21", "col34"]
df["mark"] = df.loc[:, cols_to_look_for].eq(0).all(axis=1)
(In the example you gave in the question, their consecutiveness allowed the slice notation.)
Upvotes: 1