Reputation: 69
I have the following table where Date is the index column:
Date | ColA | ColB |
---|---|---|
2021-01-01 | 10 | 40 |
2021-01-02 | 20 | 15 |
2022-01-01 | 10 | 40 |
2022-01-02 | 20 | 15 |
I want to add row-subtotals to show yearly summation:
Date | ColA | ColB |
---|---|---|
2021-01-01 | 10 | 40 |
2021-01-02 | 20 | 15 |
2021 Subtotal | 30 | 55 |
2022-01-02 | 20 | 15 |
2021-01-02 | 20 | 15 |
2022 Subtotal | 40 | 30 |
I tried the following code:
df.groupby(level='Date').transform("sum")
However I am getting the following error:
DateFormatError: invalid date '2022 Subtotal'
Any alternatives to achieve similar results?
Upvotes: 2
Views: 160
Reputation: 261270
You can use:
# grouper for years
# convert to datetime and extract year
# convert to string and add ' Subtotal'
group = (pd.to_datetime(df['Date']).dt.year
.astype(str)
.add(' Subtotal')
)
# perform groupby.sum using the above grouper
# concatenation with original data
# sorting by year to move the subtotals below their respective data
out = (pd.concat(
[df,
df.groupby(group)
.sum().reset_index()])
.sort_values(by='Date',
key=lambda x: x.str.extract('(\d+)', expand=False),
kind='stable'
)
)
Output:
Date ColA ColB
0 2021-01-01 10 40
1 2021-01-02 20 15
0 2021 Subtotal 30 55
2 2022-01-01 10 40
3 2022-01-02 20 15
1 2022 Subtotal 30 55
Upvotes: 2