Reputation: 53
I have a pandas (multi-index) dataframe as follows:
date Volume
Account ID
10001 2 02-03-2017 0
3 02-03-2017 0
3 09-03-2017 0
3 16-03-2017 50
3 21-03-2017 65
3 28-03-2017 0
3 04-04-2017 0
3 11-04-2017 60
10002 5 02-03-2017 14.5
6 09-03-2017 14.5
3 09-03-2017 0
3 16-03-2017 0
3 21-03-2017 20
3 28-03-2017 33
10003 8 21-03-2017 14.5
9 28-03-2017 15.0
Now I want to delete all rows at the beginning of a series (dates of an account-product combination) with volume 0. So I want to keep the rows with volume 0 in case they are in the middle or at the end of a series.
So in the above example, I'd want the following output:
date Volume
Account ID
10001 3 16-03-2017 50
3 21-03-2017 65
3 28-03-2017 0
3 04-04-2017 0
3 11-04-2017 60
10002 5 02-03-2017 14.5
6 09-03-2017 14.5
3 21-03-2017 20
3 28-03-2017 33
10003 8 21-03-2017 14.5
9 28-03-2017 15.0
Currently, I've been removing complete series with a filter, e.g.
df = data.groupby(level = acc_prod).filter(lambda x: len(x) > 26)
And I've seen examples of removing only the first row; Python: Pandas - Delete the first row by group . Yet I do not know how to only delete rows of zero at the beginning of an account-product series.
Would be great if someone could help me out on this!
Upvotes: 5
Views: 2142
Reputation: 863166
You can use boolean indexing
with mask
created by groupby
with cumsum
and find values which are not 0
:
print (df.groupby(level=['Account','ID'])['Volume'].cumsum())
Account ID
10001 2 0.0
3 0.0
3 0.0
3 50.0
3 115.0
3 115.0
3 115.0
3 175.0
10002 5 14.5
6 14.5
3 0.0
3 0.0
3 20.0
3 53.0
10003 8 14.5
9 15.0
Name: Volume, dtype: float64
mask = df.groupby(level=['Account','ID'])['Volume'].cumsum() != 0
#!= is same as ne function
#mask = df.groupby(level=['Account','ID'])['Volume'].cumsum().ne(0)
print (mask)
Account ID
10001 2 False
3 False
3 False
3 True
3 True
3 True
3 True
3 True
10002 5 True
6 True
3 False
3 False
3 True
3 True
10003 8 True
9 True
Name: Volume, dtype: bool
print (df[mask])
date Volume
Account ID
10001 3 16-03-2017 50.0
3 21-03-2017 65.0
3 28-03-2017 0.0
3 04-04-2017 0.0
3 11-04-2017 60.0
10002 5 02-03-2017 14.5
6 09-03-2017 14.5
3 21-03-2017 20.0
3 28-03-2017 33.0
10003 8 21-03-2017 14.5
9 28-03-2017 15.0
Upvotes: 4