Ollie Glass
Ollie Glass

Reputation: 19993

Remove zero value head and tail rows from DataFrame

I have a collection of DataFrames with zero-value head and tail rows. I would like to drop these zero-value rows and keep the range between them.

For example, given this DataFrame:

   x
0  0
1  0
2  1 
3  0
4  3
5  1
6  0
7  0

I would like a function that returns this:

   x
2  1 
3  0
4  3
5  1

Note that the zero value at row 3 has not been lost. I'm not wanting to remove all zeroes, only the head and tail. How can this be done with Pandas?

Upvotes: 1

Views: 932

Answers (6)

Sam
Sam

Reputation: 1

d={'x':[0,0,1,2,1,0,0,4,4,0,0]}
df=pd.DataFrame(d)

Remove tail zeros:

i=len(df)-1
x=df['x'][i]

while x==0:
    i-=1
    x=df['x'][i]
df=df[0:i+1] 

Remove head zeros:

while x==0:
    del df['x'][i]
    i-=1
    x=df['x'][i]
df=df[i:] 

Upvotes: 0

mad_
mad_

Reputation: 8273

It can also be useful. nonzero will give indices of the non zero elements in the series. Access first and last index of the tuple will generate the expected output

import pandas as pd
df2=pd.DataFrame({'cols':[0,0,1,0,3,1,0,0]})
non_zero_index = df2.cols.nonzero()[0]
start, end = non_zero_index[0],non_zero_index[-1]
df2.loc[start:end]
    cols
2   1
3   0
4   3
5   1

Upvotes: 3

BENY
BENY

Reputation: 323326

Inspire by mad_

df.loc[df.x.mask(df.x==0).first_valid_index():df.x.mask(df.x==0).last_valid_index()]
Out[39]: 
   x
2  1
3  0
4  3
5  1

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153500

I'll put my horse in this race too.

Cumsum forward, cumsum backwards, take the minimum value if equal to zero discard.

df[np.minimum(df['x'].cumsum(), df['x'][::-1].cumsum()[::-1]).ne(0)]

Output:

   x
2  1
3  0
4  3
5  1

Upvotes: 1

jpp
jpp

Reputation: 164773

For large arrays with few zeros, manual iteration will be more efficient than Boolean indexing. For example, via a generator expression with next and enumerate:

start = next(idx for idx, val in enumerate(df['x']) if val != 0)
end = -next(idx for idx, val in enumerate(df['x'].iloc[::-1]) if val != 0)

res = df['x'].iloc[start:end]

2    1
3    0
4    3
5    1
Name: x, dtype: int64

Upvotes: 1

jezrael
jezrael

Reputation: 863176

Use:

df = df[df['x'].cumsum().mul(df['x'].iloc[::-1].cumsum()).ne(0)]
print (df)
   x
2  1
3  0
4  3
5  1

Explanation:

Get cumsum of column:

print (df['x'].cumsum())
0    0
1    0
2    1
3    1
4    4
5    5
6    5
7    5
Name: x, dtype: int64

Invert column and again cumsum:

print (df['x'].iloc[::-1].cumsum())
7    0
6    0
5    1
4    4
3    4
2    5
1    5
0    5
Name: x, dtype: int64

Multiple by Series.mul:

print (df['x'].cumsum().mul(df['x'].iloc[::-1].cumsum()))
0     0
1     0
2     5
3     4
4    16
5     5
6     0
7     0
Name: x, dtype: int64

And check for not equal by ne (!=) of 0:

print (df['x'].cumsum().mul(df['x'].iloc[::-1].cumsum()).ne(0))
0    False
1    False
2     True
3     True
4     True
5     True
6    False
7    False
Name: x, dtype: bool

Last filter by boolean indexing.

Thank you, @Wen for another solution:

df[(df.x.eq(0).cumprod().eq(0))&(df.x[::-1].eq(0).cumprod().eq(0))]  

Upvotes: 4

Related Questions