Reputation: 64064
I have the following data frame:
import pandas as pd
df = pd.DataFrame({'id':['a','b','c','d','e'],
'A':[-14,-90,-90,-96,-91],
'B':[-103,0,-110,-114,-114],
'D':[0,0,0,0,0],
'C':[-101,0,-110,0,0]})
It looks like this:
A B C D id
0 -14 -103 -101 0 a
1 -90 0 0 0 b
2 -90 -110 -110 0 c
3 -96 -114 0 0 d
4 -91 -114 0 0 e
What I want to do is to perform operation to delete any column if there are 0 in more than 2 rows. How can I achieve that?
Finally will have data frame with this column: A, B, id.
Upvotes: 1
Views: 86
Reputation: 863501
You can use cumsum
with any
for mask and then a bit changed boolean indexing
for selecting by columns:
mask = ((df == 0).cumsum() > 1).any()
print (mask)
A False
B False
C True
id False
dtype: bool
print (df.ix[:, ~mask])
A B id
0 -14 -103 a
1 -90 0 b
2 -90 -110 c
3 -96 -114 d
4 -91 -114 e
Explanation of masks:
print (df == 0)
A B C id
0 False False False False
1 False True True False
2 False False False False
3 False False True False
4 False False True False
print ((df == 0).cumsum())
A B C id
0 0 0 0 0
1 0 1 1 0
2 0 1 1 0
3 0 1 2 0
4 0 1 3 0
print ((df == 0).cumsum() > 1)
A B C id
0 False False False False
1 False False False False
2 False False False False
3 False False False False
4 False False True False
EDIt by comment - need all
for mask:
mask = (df == 0).all()
print (mask)
A False
B False
C False
D True
id False
dtype: bool
print (df.ix[:, ~mask])
A B C id
0 -14 -103 -101 a
1 -90 0 0 b
2 -90 -110 -110 c
3 -96 -114 0 d
4 -91 -114 0 e
Upvotes: 3