Reputation: 97
I have this data dataframe
import pandas as pd
import numpy as np
from pandas import DataFrame
df3 = pd.DataFrame({
'MONTHYEAR' : ['2021/01', '2021/02', '2021/03', '2021/01', '2021/02', '2021/03', '2022/01'],
'CATEGORY' : ['INCOME', 'INCOME', 'INCOME', 'INCOME', 'INCOME', 'INCOME', 'INCOME'],
'SUBCATEGORY': ['INCOME HD', 'INCOME HD', 'INCOME HD', 'INCOME AD','INCOME AD','INCOME AD', 'INCOME AD'],
'AMOUNT': [1000, 2000, 3000, 4000, 5000, 6000, 7000]
})
I want to add 3 new columns HD, AD and SUM
df3['HD'] = 0
df3['AD'] = 0
df3['TOTAL'] = 0
df3['TOTAL'] = df3['AMOUNT'].groupby(df3['MONTHYEAR']).transform('sum')
df3.loc[df3['SUBCATEGORY'] == "INCOME HD", 'HD'] = df3['AMOUNT']
df3.loc[df3['SUBCATEGORY'] == "INCOME AD", 'AD'] = df3['AMOUNT']
df3
so far I get this:
but what I want is this
Any help much appriciated !
Upvotes: 2
Views: 59
Reputation: 1381
You can do this using .agg()
function.
Here's the code:
df3 = df3.groupby(['MONTHYEAR']).agg({'CATEGORY':'first', 'HD':'sum', 'AD':'sum', 'TOTAL':'first'}).reset_index()
The output will look like this:
MONTHYEAR CATEGORY HD AD TOTAL
0 2021/01 INCOME 1000 4000 5000
1 2021/02 INCOME 2000 5000 7000
2 2021/03 INCOME 3000 6000 9000
3 2022/01 INCOME 0 7000 7000
Upvotes: 1
Reputation: 862631
Use DataFrame.pivot_table
first, rename
columns and create new column by sum
, last convert MultiIndex
to columns:
df1 = (df3.pivot_table(index=['MONTHYEAR','CATEGORY'],
columns='SUBCATEGORY',
values='AMOUNT',
aggfunc='sum',
fill_value=0)
.rename(columns={'INCOME AD':'AD','INCOME HD':'HD'})
[['HD','AD']]
.assign(TOTAL = lambda x: x.sum(axis=1))
.reset_index()
.rename_axis(None, axis=1)
)
print (df1)
MONTHYEAR CATEGORY HD AD TOTAL
0 2021/01 INCOME 1000 4000 5000
1 2021/02 INCOME 2000 5000 7000
2 2021/03 INCOME 3000 6000 9000
3 2022/01 INCOME 0 7000 7000
Upvotes: 1