kakarotto
kakarotto

Reputation: 300

How to efficiently remove leading rows containing only 0 as value?

I have a pandas dataframe and the first rows have only zeros as value. I would like to remove those rows.

So, denoting df my dataframe and ['a', 'b', 'c'] its columns. I tried the following code.

df[(df[['a', 'b', 'c']] != 0).all(axis=1)]

But it will also turn the following dataframe :

a b c
0 0 0
0 0 0
1 0 0
0 0 0
2 3 5
4 5 6
0 0 0
1 1 1 

Into this one :

a b c
1 0 0
2 3 5
4 5 6
1 1 1

That's not what I want. I just want to focus on leading rows. So, I would like to have :

a b c
1 0 0
0 0 0
2 3 5
4 5 6 
0 0 0
1 1 1

It would be great to have a simple and efficient solution using pandas functions. Thanks

Upvotes: 2

Views: 218

Answers (2)

jezrael
jezrael

Reputation: 863166

General solution working if all 0 rows in data - first use cummsum for cumualtive sum and then test any Trues per rows:

df1 = df[(df[['a', 'b', 'c']] != 0).cumsum().any(1)]
print (df1)
   a  b  c
2  1  0  0
3  0  0  0
4  2  3  5
5  4  5  6
6  0  0  0
7  1  1  1

Solution if at least one non 0 row in data - get first value of non 0 rows with Series.idxmax:

df1 = df.iloc[(df[['a', 'b', 'c']] != 0).any(axis=1).idxmax():]
print (df1)
   a  b  c
2  1  0  0
3  0  0  0
4  2  3  5
5  4  5  6
6  0  0  0
7  1  1  1

Upvotes: 1

John Sloper
John Sloper

Reputation: 1821

Here is an example that finds the first row that is not all zeros and then selects all from that point on. Should solve the problem you are describing:

ix_first_valid = df[(df != 0).any(axis=1)].index[0]
df[ix_first_valid:]

Upvotes: 1

Related Questions