Reputation: 79
I want to calculate running total of Promo, and reset running total when Promo changes. How can I achieve this with Python and Pandas? Thanks very much!
Id Date Promo Running_Total
0 19 2015-07-09 0 0
1 18 2015-07-10 0 0
2 17 2015-07-11 0 0
3 16 2015-07-13 1 1
4 15 2015-07-14 1 2
5 14 2015-07-15 1 3
6 13 2015-07-16 1 4
7 12 2015-07-17 1 5
8 11 2015-07-18 0 0
9 10 2015-07-20 0 0
10 9 2015-07-21 0 0
11 8 2015-07-22 0 0
12 7 2015-07-23 0 0
13 6 2015-07-24 0 0
14 5 2015-07-25 0 0
15 4 2015-07-27 1 1
16 3 2015-07-28 1 2
17 2 2015-07-29 1 3
18 1 2015-07-30 1 4
19 0 2015-07-31 1 5
Upvotes: 4
Views: 1470
Reputation: 863166
Completely changed solution(s):
Values of column Promo
was changed with 2
and 3
.
For count consecutives all values use compare by ne
(!=
) by shift
ed column with cumsum
for groups.
Then use groupby
by this groups and count by cumcount
and add 1
for count from 1
:
df['Running_Total'] = (df.groupby(df['Promo'].ne(df['Promo'].shift()).cumsum())
.cumcount()
.add(1))
print (df)
Id Date Promo Running_Total
0 19 2015-07-09 0 1
1 18 2015-07-10 0 2
2 17 2015-07-11 0 3
3 16 2015-07-13 2 1
4 15 2015-07-14 2 2
5 14 2015-07-15 2 3
6 13 2015-07-16 1 1
7 12 2015-07-17 1 2
8 11 2015-07-18 0 1
9 10 2015-07-20 0 2
10 9 2015-07-21 0 3
11 8 2015-07-22 0 4
12 7 2015-07-23 3 1
13 6 2015-07-24 3 2
14 5 2015-07-25 3 3
15 4 2015-07-27 1 1
16 3 2015-07-28 1 2
17 2 2015-07-29 1 3
18 1 2015-07-30 1 4
19 0 2015-07-31 1 5
But if need replace rows with 0
in Promo
column multiple column by boolean mask - comparing df['Promo'].ne(0)
- it multiple by 0
all 0
rows and all another by 1
:
df['Running_Total'] = df.groupby(df['Promo'].ne(df['Promo'].shift()).cumsum())
.cumcount()
.add(1)
.mul(df['Promo'].ne(0))
print (df)
0 19 2015-07-09 0 0
1 18 2015-07-10 0 0
2 17 2015-07-11 0 0
3 16 2015-07-13 2 1
4 15 2015-07-14 2 2
5 14 2015-07-15 2 3
6 13 2015-07-16 1 1
7 12 2015-07-17 1 2
8 11 2015-07-18 0 0
9 10 2015-07-20 0 0
10 9 2015-07-21 0 0
11 8 2015-07-22 0 0
12 7 2015-07-23 3 1
13 6 2015-07-24 3 2
14 5 2015-07-25 3 3
15 4 2015-07-27 1 1
16 3 2015-07-28 1 2
17 2 2015-07-29 1 3
18 1 2015-07-30 1 4
19 0 2015-07-31 1 5
Detail:
print (df['Promo'].ne(df['Promo'].shift()).cumsum())
0 1
1 1
2 1
3 2
4 2
5 2
6 3
7 3
8 4
9 4
10 4
11 4
12 5
13 5
14 5
15 6
16 6
17 6
18 6
19 6
Name: Promo, dtype: int32
Upvotes: 2