Reputation: 19993
I have a collection of DataFrames with zero-value head and tail rows. I would like to drop these zero-value rows and keep the range between them.
For example, given this DataFrame:
x
0 0
1 0
2 1
3 0
4 3
5 1
6 0
7 0
I would like a function that returns this:
x
2 1
3 0
4 3
5 1
Note that the zero value at row 3 has not been lost. I'm not wanting to remove all zeroes, only the head and tail. How can this be done with Pandas?
Upvotes: 1
Views: 932
Reputation: 1
d={'x':[0,0,1,2,1,0,0,4,4,0,0]}
df=pd.DataFrame(d)
Remove tail zeros:
i=len(df)-1
x=df['x'][i]
while x==0:
i-=1
x=df['x'][i]
df=df[0:i+1]
Remove head zeros:
while x==0:
del df['x'][i]
i-=1
x=df['x'][i]
df=df[i:]
Upvotes: 0
Reputation: 8273
It can also be useful. nonzero
will give indices of the non zero elements in the series. Access first and last index of the tuple will generate the expected output
import pandas as pd
df2=pd.DataFrame({'cols':[0,0,1,0,3,1,0,0]})
non_zero_index = df2.cols.nonzero()[0]
start, end = non_zero_index[0],non_zero_index[-1]
df2.loc[start:end]
cols
2 1
3 0
4 3
5 1
Upvotes: 3
Reputation: 323326
Inspire by mad_
df.loc[df.x.mask(df.x==0).first_valid_index():df.x.mask(df.x==0).last_valid_index()]
Out[39]:
x
2 1
3 0
4 3
5 1
Upvotes: 0
Reputation: 153500
I'll put my horse in this race too.
Cumsum forward, cumsum backwards, take the minimum value if equal to zero discard.
df[np.minimum(df['x'].cumsum(), df['x'][::-1].cumsum()[::-1]).ne(0)]
Output:
x
2 1
3 0
4 3
5 1
Upvotes: 1
Reputation: 164773
For large arrays with few zeros, manual iteration will be more efficient than Boolean indexing. For example, via a generator expression with next
and enumerate
:
start = next(idx for idx, val in enumerate(df['x']) if val != 0)
end = -next(idx for idx, val in enumerate(df['x'].iloc[::-1]) if val != 0)
res = df['x'].iloc[start:end]
2 1
3 0
4 3
5 1
Name: x, dtype: int64
Upvotes: 1
Reputation: 863176
Use:
df = df[df['x'].cumsum().mul(df['x'].iloc[::-1].cumsum()).ne(0)]
print (df)
x
2 1
3 0
4 3
5 1
Explanation:
Get cumsum
of column:
print (df['x'].cumsum())
0 0
1 0
2 1
3 1
4 4
5 5
6 5
7 5
Name: x, dtype: int64
Invert column and again cumsum
:
print (df['x'].iloc[::-1].cumsum())
7 0
6 0
5 1
4 4
3 4
2 5
1 5
0 5
Name: x, dtype: int64
Multiple by Series.mul
:
print (df['x'].cumsum().mul(df['x'].iloc[::-1].cumsum()))
0 0
1 0
2 5
3 4
4 16
5 5
6 0
7 0
Name: x, dtype: int64
And check for not equal by ne
(!=)
of 0
:
print (df['x'].cumsum().mul(df['x'].iloc[::-1].cumsum()).ne(0))
0 False
1 False
2 True
3 True
4 True
5 True
6 False
7 False
Name: x, dtype: bool
Last filter by boolean indexing
.
Thank you, @Wen for another solution:
df[(df.x.eq(0).cumprod().eq(0))&(df.x[::-1].eq(0).cumprod().eq(0))]
Upvotes: 4