neversaint
neversaint

Reputation: 64064

Delete Pandas Dataframe columns if there are zeros in more than K rows

I have the following data frame:

import pandas as pd
df = pd.DataFrame({'id':['a','b','c','d','e'],
                   'A':[-14,-90,-90,-96,-91],
                   'B':[-103,0,-110,-114,-114],
                   'D':[0,0,0,0,0],
                   'C':[-101,0,-110,0,0]})

It looks like this:

    A    B    C  D id
0 -14 -103 -101  0  a
1 -90    0    0  0  b
2 -90 -110 -110  0  c
3 -96 -114    0  0  d
4 -91 -114    0  0  e

What I want to do is to perform operation to delete any column if there are 0 in more than 2 rows. How can I achieve that?

Finally will have data frame with this column: A, B, id.

Upvotes: 1

Views: 86

Answers (1)

jezrael
jezrael

Reputation: 863501

You can use cumsum with any for mask and then a bit changed boolean indexing for selecting by columns:

mask = ((df == 0).cumsum() > 1).any()
print (mask)
A     False
B     False
C      True
id    False
dtype: bool

print (df.ix[:, ~mask])
    A    B id
0 -14 -103  a
1 -90    0  b
2 -90 -110  c
3 -96 -114  d
4 -91 -114  e

Explanation of masks:

print (df == 0)
       A      B      C     id
0  False  False  False  False
1  False   True   True  False
2  False  False  False  False
3  False  False   True  False
4  False  False   True  False

print ((df == 0).cumsum())
   A  B  C  id
0  0  0  0   0
1  0  1  1   0
2  0  1  1   0
3  0  1  2   0
4  0  1  3   0

print ((df == 0).cumsum() > 1)
       A      B      C     id
0  False  False  False  False
1  False  False  False  False
2  False  False  False  False
3  False  False  False  False
4  False  False   True  False

EDIt by comment - need all for mask:

mask = (df == 0).all()
print (mask)
A     False
B     False
C     False
D      True
id    False
dtype: bool

print (df.ix[:, ~mask])
    A    B    C id
0 -14 -103 -101  a
1 -90    0    0  b
2 -90 -110 -110  c
3 -96 -114    0  d
4 -91 -114    0  e

Upvotes: 3

Related Questions