Reputation: 6260
I have a weird raw dataframe in pandas, which looks like:
class year period type info valuebasic value1 value2 value3... value16
A 2019 16 A bla 10 1 0 0 2
A 2019 32 A bla 20 0 1 1 1
...
A 2021 16 A bla 30 30 1 2 1
...
B 2019 16 A bla 1 1 1 1 1
And I want to calculate the cumulative szn fir every class year and period for the value columns, so that I have it for the period 17-32 and so on, which ends up in the following dataframe:
class year period type info value
A 2019 16 A bla 10
A 2019 17 A bla 11 (10+1)
A 2019 18 A bla 11 (10+1+0)
....
A 2019 32 A bla 20 (10+1+0+0+...+2)
A 2019 32 A bla 20
...
A 2021 16 A bla 30
.....
B 2019 16 A bla 1
....
B 2019 32 A bla 24 (1+1+...+1)
Upvotes: 0
Views: 66
Reputation: 862731
Use melt
for unpivot with GroupBy.cumsum
:
df = (df.melt(['class','year','period','valuebasic'],
ignore_index=False)
.sort_index(kind='mergesort', ignore_index=True)
.drop('variable', axis=1))
df['value1'] = df['valuebasic'] + df.groupby(['class','year','period'])['value'].cumsum()
print (df)
class year period valuebasic value value1
0 A 2019 16 10 1 11
1 A 2019 16 10 0 11
2 A 2019 16 10 0 11
3 A 2019 16 10 2 13
4 A 2019 32 20 0 20
5 A 2019 32 20 1 21
6 A 2019 32 20 1 22
7 A 2019 32 20 1 23
8 A 2021 16 30 30 60
9 A 2021 16 30 1 61
10 A 2021 16 30 2 63
11 A 2021 16 30 1 64
12 B 2019 16 1 1 2
13 B 2019 16 1 1 3
14 B 2019 16 1 1 4
15 B 2019 16 1 1 5
Alternative solution:
df = (df.set_index(['class','year','period','valuebasic'])
.stack()
.reset_index(level=4, drop=True)
.reset_index(name='value'))
df['value1'] = df['valuebasic'] + df.groupby(['class','year','period'])['value'].cumsum()
print (df)
class year period valuebasic value value1
0 A 2019 16 10 1 11
1 A 2019 16 10 0 11
2 A 2019 16 10 0 11
3 A 2019 16 10 2 13
4 A 2019 32 20 0 20
5 A 2019 32 20 1 21
6 A 2019 32 20 1 22
7 A 2019 32 20 1 23
8 A 2021 16 30 30 60
9 A 2021 16 30 1 61
10 A 2021 16 30 2 63
11 A 2021 16 30 1 64
12 B 2019 16 1 1 2
13 B 2019 16 1 1 3
14 B 2019 16 1 1 4
15 B 2019 16 1 1 5
For repalce original column value
use:
df['value'] = df['valuebasic'] + df.groupby(['class','year','period'])['value'].cumsum()
Upvotes: 1