Reputation: 169
I am attempting to get a accumulative sum and count within a groupby in pandas, but only if the but only on consecutive row values. If the value is not consecutive, I'd like to accumulative sum and count to reset.
The groups are by "ID" and "STATUS", and the "DAYS" are the values being summed.
My current data set looks like this:
ID DAYS STATUS
MXD 31 Est
MXD 11 Est
MXD 29 RDDS
MXD 31 Est
------------------
NOA 28 Est
NOA 34 RDDS
NOA 27 Est
NOA 27 Est
------------------
TNT 34 RDDS
TNT 27 RDDS
TNT 27 Est
TNT 27 RDDS
My desired data set would look like this:
ID DAYS STATUS CONSECT_ONLY_SUM CONSECT_ONLY_COUNT
MXD 31 Est 31 1
MXD 11 Est 42 2
MXD 29 RDDS 29 1 - accumulative sum and count reset on new non-consect row
MXD 31 Est 31 1
-------------------
NOA 28 Est 28 1
NOA 34 RDDS 34 1
NOA 27 Est 27 1 - accumulative sum and count starts on consect "STATUS" rows
NOA 27 Est 54 2
-------------------
TNT 34 RDDS 34 1
TNT 27 RDDS 61 2
TNT 27 Est 27 1 - accumulative sum and count reset on new non-consect row
TNT 27 RDDS 27 1
Any help on this would be greatly appreciated.
Upvotes: 0
Views: 785
Reputation: 323226
We need use shift
with cumsum
create the subgroup key , then we do cumsum
and cumcount
. Notice here I am using pd.Serise groupby
g=df['DAYS'].groupby([df['ID'],df.STATUS.ne(df.STATUS.shift()).cumsum()])
df['SUMCOND']=g.cumsum()
df['COUNTCOND']=g.cumcount()+1
df
ID DAYS STATUS SUMCOND COUNTCOND
0 MXD 31 Est 31 1
1 MXD 11 Est 42 2
2 MXD 29 RDDS 29 1
3 MXD 31 Est 31 1
4 NOA 28 Est 28 1
5 NOA 34 RDDS 34 1
6 NOA 27 Est 27 1
7 NOA 27 Est 54 2
8 TNT 34 RDDS 34 1
9 TNT 27 RDDS 61 2
10 TNT 27 Est 27 1
11 TNT 27 RDDS 27 1
Upvotes: 1