Reputation: 31
I have a dataframe as below:
data =[['A','ABC001','18M01',1,3],['A','ABC002','18M01',2,4],['A','ABC001','18M02',3,3],['B','ABC001','18M01',4,3],['B','ABC002','18M02',5,4],['B','ABC002','18M02',6,4]]
df = pd.DataFrame(data,columns=['Type','Product','Month','Requirement','Inventory'])
df
Input:
Type Product Month Requirement Inventory
A ABC001 18M01 1 3
A ABC002 18M01 2 4
A ABC001 18M02 3 3
B ABC001 18M01 4 3
B ABC002 18M02 5 4
B ABC002 18M02 6 4
What i trying to do is summarize it into a dataframe somehow like this
Output:
Type Product Values 18M01 18M02
A ABC001 Sum of Requirement 1 3
A ABC001 Average of Inventory 3 3
A ABC002 Sum of Requirement 2 NaN
A ABC002 Average of Inventory 4 NaN
B ABC001 Sum of Requirement 4 NaN
B ABC001 Average of Inventory 3 NaN
B ABC002 Sum of Requirement NaN 11
B ABC002 Average of Inventory NaN 4
I can create this in pivot excel very easily but no idea at all when come to using pandas pivot. Please help
Upvotes: 3
Views: 359
Reputation: 9081
A pivot_table
way of doing things -
df1 = df.pivot_table('Requirement', ['Type','Product'], 'Month', aggfunc='sum')
df1['Values'] = 'Sum of Requirement'
df2 = df.pivot_table('Inventory', ['Type','Product'], 'Month', aggfunc='mean')
df2['Values'] = 'Average of Inventory'
df1.append(df2)
Output
Month 18M01 18M02 Values
Type Product
A ABC001 1.0 3.0 Sum of Requirement
ABC002 2.0 NaN Sum of Requirement
B ABC001 4.0 NaN Sum of Requirement
ABC002 NaN 11.0 Sum of Requirement
A ABC001 3.0 3.0 Average of Inventory
ABC002 4.0 NaN Average of Inventory
B ABC001 3.0 NaN Average of Inventory
ABC002 NaN 4.0 Average of Inventory
You can throw in a reset_index()
to make it better -
df1.append(df2).reset_index()
Month Type Product 18M01 18M02 Values
0 A ABC001 1.0 3.0 Sum of Requirement
1 A ABC002 2.0 NaN Sum of Requirement
2 B ABC001 4.0 NaN Sum of Requirement
3 B ABC002 NaN 11.0 Sum of Requirement
4 A ABC001 3.0 3.0 Average of Inventory
5 A ABC002 4.0 NaN Average of Inventory
6 B ABC001 3.0 NaN Average of Inventory
7 B ABC002 NaN 4.0 Average of Inventory
Upvotes: 1
Reputation: 862761
I think you need aggregate by sum
and mean
, flatten MultiIndex in column and reshape by stack
with unstack
:
df1 = (df.groupby(['Type','Product','Month'])
.agg({'Requirement': 'sum','Inventory':'mean'})
.rename(columns={'Requirement':'Sum of Requirement',
'Inventory':'Average of Inventory'})
.stack()
.unstack(2)
.reset_index()
.rename(columns={'level_2':'Values'}))
print (df1)
Month Type Product Values 18M01 18M02
0 A ABC001 Sum of Requirement 1.0 3.0
1 A ABC001 Average of Inventory 3.0 3.0
2 A ABC002 Sum of Requirement 2.0 NaN
3 A ABC002 Average of Inventory 4.0 NaN
4 B ABC001 Sum of Requirement 4.0 NaN
5 B ABC001 Average of Inventory 3.0 NaN
6 B ABC002 Sum of Requirement NaN 11.0
7 B ABC002 Average of Inventory NaN 4.0
Upvotes: 1