Pandas Groupby CumSum Only on Consecutive Rows

Question

I am attempting to get a accumulative sum and count within a groupby in pandas, but only if the but only on consecutive row values. If the value is not consecutive, I'd like to accumulative sum and count to reset.

The groups are by "ID" and "STATUS", and the "DAYS" are the values being summed.

My current data set looks like this:

ID  DAYS  STATUS
MXD  31    Est
MXD  11    Est
MXD  29    RDDS
MXD  31    Est
------------------
NOA  28    Est
NOA  34    RDDS
NOA  27    Est
NOA  27    Est
------------------
TNT  34    RDDS
TNT  27    RDDS
TNT  27    Est
TNT  27    RDDS

My desired data set would look like this:

ID  DAYS  STATUS  CONSECT_ONLY_SUM CONSECT_ONLY_COUNT
MXD  31    Est          31                 1
MXD  11    Est          42                 2
MXD  29    RDDS         29                 1  - accumulative sum and count reset on new non-consect row
MXD  31    Est          31                 1
-------------------
NOA  28    Est          28                 1
NOA  34    RDDS         34                 1
NOA  27    Est          27                 1  - accumulative sum and count starts on consect "STATUS" rows
NOA  27    Est          54                 2
-------------------
TNT  34    RDDS         34                 1
TNT  27    RDDS         61                 2
TNT  27    Est          27                 1 - accumulative sum and count reset on new non-consect row
TNT  27    RDDS         27                 1

Any help on this would be greatly appreciated.

BENY · Accepted Answer

We need use shift with cumsum create the subgroup key , then we do cumsum and cumcount. Notice here I am using pd.Serise groupby

g=df['DAYS'].groupby([df['ID'],df.STATUS.ne(df.STATUS.shift()).cumsum()])
df['SUMCOND']=g.cumsum()
df['COUNTCOND']=g.cumcount()+1
df
     ID  DAYS STATUS  SUMCOND  COUNTCOND
0   MXD    31    Est       31          1
1   MXD    11    Est       42          2
2   MXD    29   RDDS       29          1
3   MXD    31    Est       31          1
4   NOA    28    Est       28          1
5   NOA    34   RDDS       34          1
6   NOA    27    Est       27          1
7   NOA    27    Est       54          2
8   TNT    34   RDDS       34          1
9   TNT    27   RDDS       61          2
10  TNT    27    Est       27          1
11  TNT    27   RDDS       27          1

Pandas Groupby CumSum Only on Consecutive Rows

Answers (1)

Related Questions