Laurent TEMO
Laurent TEMO

Reputation: 83

Python Group BY Cumsum

I have this DataFrame :

Value  Month
 0       1
 1       2
 8       3
 11      4
 12      5
 17      6
 0       7
 0       8
 0       9
 0       10
 1       11
 2       12
 7       1
 3       2
 1       3
 0       4
 0       5

And i want to create new variable "Cumsum" like this :

Value  Month  Cumsum
 0       1       0
 1       2       1
 8       3       9 
 11      4       20
 12      5       32
 17      6
 0       7
 0       8       ...
 0       9
 0       10
 1       11
 2       12
 7       1       7
 3       2       10
 1       3       11
 0       4       11
 0       5       11

Sorry if my code it is not clean, I failed to include my dataframe ...

My problem is that I do not have only 12 lines (1 line per month) but I have many more lines. By cons I know that my table is tidy and I want to have the cumulated amount until the 12th month and repeat that when the month 1 appears.

Thank you for your help.

Upvotes: 2

Views: 161

Answers (2)

Abhi
Abhi

Reputation: 4233

Try:

df['Cumsum'] = df.groupby((df.Month == 1).cumsum())['Value'].cumsum()
print(df)

     Value  Month   Cumsum
0      0      1       0
1      1      2       1
2      8      3       9
3     11      4      20
4     12      5      32
5     17      6      49
6      0      7      49 
7      0      8      49
8      0      9      49
9      0     10      49
10     1     11      50
11     2     12      52
12     7      1       7
13     3      2      10
14     1      3      11
15     0      4      11
16     0      5      11

Upvotes: 2

Nihal
Nihal

Reputation: 5334

code:

df = pd.DataFrame({'value': [0, 1, 8, 11, 12, 17, 0, 0, 0, 0, 1, 2, 7, 3, 1, 0, 0],
                   'month': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5]})

temp = int(len(df)/12)
for i in range(temp + 1):
    start = i * 12
    if i < temp:
        end = (i + 1) * 12 - 1
        df.loc[start:end, 'cumsum'] = df.loc[start:end, 'value'].cumsum()
    else:
        df.loc[start:, 'cumsum'] = df.loc[start:, 'value'].cumsum()

# df.loc[12:, 'cumsum'] = 12
print(df)

output:

    value  month  cumsum
0       0      1     0.0
1       1      2     1.0
2       8      3     9.0
3      11      4    20.0
4      12      5    32.0
5      17      6    49.0
6       0      7    49.0
7       0      8    49.0
8       0      9    49.0
9       0     10    49.0
10      1     11    50.0
11      2     12    52.0
12      7      1     7.0
13      3      2    10.0
14      1      3    11.0
15      0      4    11.0
16      0      5    11.0

Upvotes: 1

Related Questions