Reputation: 300
I have a pandas dataframe and the first rows have only zeros as value. I would like to remove those rows.
So, denoting df my dataframe and ['a', 'b', 'c'] its columns. I tried the following code.
df[(df[['a', 'b', 'c']] != 0).all(axis=1)]
But it will also turn the following dataframe :
a b c
0 0 0
0 0 0
1 0 0
0 0 0
2 3 5
4 5 6
0 0 0
1 1 1
Into this one :
a b c
1 0 0
2 3 5
4 5 6
1 1 1
That's not what I want. I just want to focus on leading rows. So, I would like to have :
a b c
1 0 0
0 0 0
2 3 5
4 5 6
0 0 0
1 1 1
It would be great to have a simple and efficient solution using pandas functions. Thanks
Upvotes: 2
Views: 218
Reputation: 863166
General solution working if all 0
rows in data - first use cummsum
for cumualtive sum and then test any True
s per rows:
df1 = df[(df[['a', 'b', 'c']] != 0).cumsum().any(1)]
print (df1)
a b c
2 1 0 0
3 0 0 0
4 2 3 5
5 4 5 6
6 0 0 0
7 1 1 1
Solution if at least one non 0
row in data - get first value of non 0
rows with Series.idxmax
:
df1 = df.iloc[(df[['a', 'b', 'c']] != 0).any(axis=1).idxmax():]
print (df1)
a b c
2 1 0 0
3 0 0 0
4 2 3 5
5 4 5 6
6 0 0 0
7 1 1 1
Upvotes: 1
Reputation: 1821
Here is an example that finds the first row that is not all zeros and then selects all from that point on. Should solve the problem you are describing:
ix_first_valid = df[(df != 0).any(axis=1)].index[0]
df[ix_first_valid:]
Upvotes: 1