Justin
Justin

Reputation: 79

How to calculate running total and reset when value change with Python?

I want to calculate running total of Promo, and reset running total when Promo changes. How can I achieve this with Python and Pandas? Thanks very much!

    Id        Date  Promo  Running_Total
0   19  2015-07-09      0              0
1   18  2015-07-10      0              0
2   17  2015-07-11      0              0
3   16  2015-07-13      1              1
4   15  2015-07-14      1              2
5   14  2015-07-15      1              3
6   13  2015-07-16      1              4
7   12  2015-07-17      1              5
8   11  2015-07-18      0              0
9   10  2015-07-20      0              0
10   9  2015-07-21      0              0
11   8  2015-07-22      0              0
12   7  2015-07-23      0              0
13   6  2015-07-24      0              0
14   5  2015-07-25      0              0
15   4  2015-07-27      1              1
16   3  2015-07-28      1              2
17   2  2015-07-29      1              3
18   1  2015-07-30      1              4
19   0  2015-07-31      1              5

Upvotes: 4

Views: 1470

Answers (1)

jezrael
jezrael

Reputation: 863166

Completely changed solution(s):

Values of column Promo was changed with 2 and 3.

For count consecutives all values use compare by ne (!=) by shifted column with cumsum for groups.

Then use groupby by this groups and count by cumcount and add 1 for count from 1:

df['Running_Total'] = (df.groupby(df['Promo'].ne(df['Promo'].shift()).cumsum())
                          .cumcount()
                          .add(1))
print (df)
    Id        Date  Promo  Running_Total
0   19  2015-07-09      0              1
1   18  2015-07-10      0              2
2   17  2015-07-11      0              3
3   16  2015-07-13      2              1
4   15  2015-07-14      2              2
5   14  2015-07-15      2              3
6   13  2015-07-16      1              1
7   12  2015-07-17      1              2
8   11  2015-07-18      0              1
9   10  2015-07-20      0              2
10   9  2015-07-21      0              3
11   8  2015-07-22      0              4
12   7  2015-07-23      3              1
13   6  2015-07-24      3              2
14   5  2015-07-25      3              3
15   4  2015-07-27      1              1
16   3  2015-07-28      1              2
17   2  2015-07-29      1              3
18   1  2015-07-30      1              4
19   0  2015-07-31      1              5

But if need replace rows with 0 in Promo column multiple column by boolean mask - comparing df['Promo'].ne(0) - it multiple by 0 all 0 rows and all another by 1:

df['Running_Total'] = df.groupby(df['Promo'].ne(df['Promo'].shift()).cumsum())
                        .cumcount()
                        .add(1)
                        .mul(df['Promo'].ne(0))
  print (df)

0   19  2015-07-09      0              0
1   18  2015-07-10      0              0
2   17  2015-07-11      0              0
3   16  2015-07-13      2              1
4   15  2015-07-14      2              2
5   14  2015-07-15      2              3
6   13  2015-07-16      1              1
7   12  2015-07-17      1              2
8   11  2015-07-18      0              0
9   10  2015-07-20      0              0
10   9  2015-07-21      0              0
11   8  2015-07-22      0              0
12   7  2015-07-23      3              1
13   6  2015-07-24      3              2
14   5  2015-07-25      3              3
15   4  2015-07-27      1              1
16   3  2015-07-28      1              2
17   2  2015-07-29      1              3
18   1  2015-07-30      1              4
19   0  2015-07-31      1              5

Detail:

print (df['Promo'].ne(df['Promo'].shift()).cumsum())
0     1
1     1
2     1
3     2
4     2
5     2
6     3
7     3
8     4
9     4
10    4
11    4
12    5
13    5
14    5
15    6
16    6
17    6
18    6
19    6
Name: Promo, dtype: int32

Upvotes: 2

Related Questions