Reputation: 2365
I am trying to delete the columns where the non zeros is less than a said number.This is the code I got but it is giving the same answer.What am I doing wrong?
df = pd.DataFrame([[1,0,0,0], [0,0,1,0]])
0 1 2 3
0 1 0 0 0
1 0 0 1 0
df = df.loc[:, (df.astype(bool).sum(axis=0) <= max_number_of_zeros)]
0 1 2 3
0 1 0 0 0
1 0 0 1 0
edit-- example-
0 1 2 3
0 1 0 0 0
1 2 0 1 0
2 0 2 3 4
3 1 1 1 1
output would be for value=2 the columns 0 and column 2
0 1 2 3
0 1 0 0 0
1 2 0 1 0
2 0 2 3 4
3 1 1 1 1
Upvotes: 1
Views: 60
Reputation: 863741
I think you need to change the boolean mask to df.eq(0)
which is the same as df == 0
with changed condition from <=
to <
:
max_number_of_zeros = 2
df = df.loc[:,df.eq(0).sum(axis=0) < max_number_of_zeros]
print (df)
0 2
0 1 0
1 2 1
2 0 3
3 1 1
Detail:
print (df.eq(0))
0 1 2 3
0 False True True True
1 False True False True
2 True False False False
3 False False False False
print (df.eq(0).sum(axis=0))
0 1
1 2
2 1
3 2
dtype: int64
EDIT:
max_number_of_zeros = 2
df = df.loc[:,len(df.columns) - df.astype(bool).sum(axis=0) < max_number_of_zeros]
print (df)
0 2
0 1 0
1 2 1
2 0 3
3 1 1
Upvotes: 1