How to efficiently remove leading rows containing only 0 as value?

Question

I have a pandas dataframe and the first rows have only zeros as value. I would like to remove those rows.

So, denoting df my dataframe and ['a', 'b', 'c'] its columns. I tried the following code.

df[(df[['a', 'b', 'c']] != 0).all(axis=1)]

But it will also turn the following dataframe :

Into this one :

That's not what I want. I just want to focus on leading rows. So, I would like to have :

It would be great to have a simple and efficient solution using pandas functions. Thanks

jezrael · Accepted Answer

General solution working if all 0 rows in data - first use cummsum for cumualtive sum and then test any Trues per rows:

df1 = df[(df[['a', 'b', 'c']] != 0).cumsum().any(1)]
print (df1)
   a  b  c
2  1  0  0
3  0  0  0
4  2  3  5
5  4  5  6
6  0  0  0
7  1  1  1

Solution if at least one non 0 row in data - get first value of non 0 rows with Series.idxmax:

df1 = df.iloc[(df[['a', 'b', 'c']] != 0).any(axis=1).idxmax():]
print (df1)
   a  b  c
2  1  0  0
3  0  0  0
4  2  3  5
5  4  5  6
6  0  0  0
7  1  1  1

How to efficiently remove leading rows containing only 0 as value?

Answers (2)

Related Questions