Cumulative count reset on condition

Question

I have a dataframe similar to this:

df = pd.DataFrame({'col1': ['a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c'],
                 'col2': [1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2],
                 'col3': [1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0],
                 'desired': [0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1]})

I want to apply a rolling sum on col3 which resets when either of col1 or col2 changes, or when the previous value of col3 was zero.

Note that the count is offset by 1 cell. This means the desired value for a new (col1, col2) combination will always be zero.

The code below demonstrates the required logic. However, it takes nearly 4 minutes on the dataset below.

des = []
count = 0
for i in range(1, len(df)):
    des.append(count)
    if (df.iloc[i-1].col1 == df.iloc[i].col1) & \
       (df.iloc[i-1].col2 == df.iloc[i].col2) & \
       (df.iloc[i-1].col3 == 1):
    
        count += 1
    else:
        count = 0
    
des.append(0)

df['desired'] = des

A bigger dataset to test on: https://www.dropbox.com/s/hbafcq6hdkh4r9r/test.csv?dl=0

jezrael · Accepted Answer

Use groupby with shift first and then count consecutive 1:

a = df.groupby(['col1','col2'])['col3'].shift().fillna(0).eq(1)
b = a.cumsum()

df['desired'] = b-b.where(~a).ffill().fillna(0).astype(int)

print (df.head(20))
      col1  col2  col3  desired
0   100055     1     1        0
1   100055     1     0        1
2   100055     1     0        0
3   100055     1     0        0
4   100055     1     0        0
5   100055     1     0        0
6   100055     1     0        0
7   100055     1     0        0
8   100055     1     0        0
9   100055     1     0        0
10  100055     1     1        0
11  100055     1     1        1
12  100055     1     0        2
13  100055     1     1        0
14  100055     1     1        1
15  100055     1     0        2
16  100055     1     0        0
17  100055     1     1        0
18  100055     1     0        1
19  100055     1     1        0

Cumulative count reset on condition

Answers (2)

Related Questions