ubuntu_noob
ubuntu_noob

Reputation: 2365

delete columns based on fixed values

I am trying to delete the columns where the non zeros is less than a said number.This is the code I got but it is giving the same answer.What am I doing wrong?

 df = pd.DataFrame([[1,0,0,0], [0,0,1,0]])



   0  1  2  3
0  1  0  0  0
1  0  0  1  0

df = df.loc[:, (df.astype(bool).sum(axis=0) <= max_number_of_zeros)]

   0  1  2  3
0  1  0  0  0
1  0  0  1  0

edit-- example-

   0  1  2  3
0  1  0  0  0
1  2  0  1  0
2  0  2  3  4
3  1  1  1  1

output would be for value=2 the columns 0 and column 2

 0  1  2  3
0  1  0  0  0
1  2  0  1  0
2  0  2  3  4
3  1  1  1  1

Upvotes: 1

Views: 60

Answers (1)

jezrael
jezrael

Reputation: 863741

I think you need to change the boolean mask to df.eq(0) which is the same as df == 0 with changed condition from <= to <:

max_number_of_zeros = 2
df  = df.loc[:,df.eq(0).sum(axis=0) < max_number_of_zeros]
print (df)
   0  2
0  1  0
1  2  1
2  0  3
3  1  1

Detail:

print (df.eq(0))
       0      1      2      3
0  False   True   True   True
1  False   True  False   True
2   True  False  False  False
3  False  False  False  False

print (df.eq(0).sum(axis=0))
0    1
1    2
2    1
3    2
dtype: int64

EDIT:

max_number_of_zeros = 2
df  = df.loc[:,len(df.columns) - df.astype(bool).sum(axis=0) < max_number_of_zeros]
print (df)
   0  2
0  1  0
1  2  1
2  0  3
3  1  1

Upvotes: 1

Related Questions