HYC
HYC

Reputation: 31

Summarize pandas dataframe row values into average and sum

I have a dataframe as below:

data =[['A','ABC001','18M01',1,3],['A','ABC002','18M01',2,4],['A','ABC001','18M02',3,3],['B','ABC001','18M01',4,3],['B','ABC002','18M02',5,4],['B','ABC002','18M02',6,4]]
df = pd.DataFrame(data,columns=['Type','Product','Month','Requirement','Inventory'])
df

Input:

Type Product Month Requirement Inventory
A    ABC001  18M01     1         3
A    ABC002  18M01     2         4
A    ABC001  18M02     3         3
B    ABC001  18M01     4         3
B    ABC002  18M02     5         4
B    ABC002  18M02     6         4

What i trying to do is summarize it into a dataframe somehow like this

Output:

Type    Product           Values         18M01    18M02
A       ABC001      Sum of Requirement     1       3
A       ABC001      Average of Inventory   3       3
A       ABC002      Sum of Requirement     2      NaN
A       ABC002      Average of Inventory   4      NaN
B       ABC001      Sum of Requirement     4      NaN
B       ABC001      Average of Inventory   3      NaN
B       ABC002      Sum of Requirement    NaN      11
B       ABC002      Average of Inventory  NaN      4

I can create this in pivot excel very easily but no idea at all when come to using pandas pivot. Please help

Upvotes: 3

Views: 359

Answers (2)

Vivek Kalyanarangan
Vivek Kalyanarangan

Reputation: 9081

A pivot_table way of doing things -

df1 = df.pivot_table('Requirement', ['Type','Product'], 'Month', aggfunc='sum')
df1['Values'] = 'Sum of Requirement'
df2 = df.pivot_table('Inventory', ['Type','Product'], 'Month', aggfunc='mean')
df2['Values'] = 'Average of Inventory'

df1.append(df2)

Output

Month         18M01  18M02                Values
Type Product
A    ABC001     1.0    3.0    Sum of Requirement
     ABC002     2.0    NaN    Sum of Requirement
B    ABC001     4.0    NaN    Sum of Requirement
     ABC002     NaN   11.0    Sum of Requirement
A    ABC001     3.0    3.0  Average of Inventory
     ABC002     4.0    NaN  Average of Inventory
B    ABC001     3.0    NaN  Average of Inventory
     ABC002     NaN    4.0  Average of Inventory

You can throw in a reset_index() to make it better -

df1.append(df2).reset_index()

Month Type Product  18M01  18M02                Values
0        A  ABC001    1.0    3.0    Sum of Requirement
1        A  ABC002    2.0    NaN    Sum of Requirement
2        B  ABC001    4.0    NaN    Sum of Requirement
3        B  ABC002    NaN   11.0    Sum of Requirement
4        A  ABC001    3.0    3.0  Average of Inventory
5        A  ABC002    4.0    NaN  Average of Inventory
6        B  ABC001    3.0    NaN  Average of Inventory
7        B  ABC002    NaN    4.0  Average of Inventory

Upvotes: 1

jezrael
jezrael

Reputation: 862761

I think you need aggregate by sum and mean, flatten MultiIndex in column and reshape by stack with unstack:

df1 = (df.groupby(['Type','Product','Month'])
         .agg({'Requirement': 'sum','Inventory':'mean'})
         .rename(columns={'Requirement':'Sum of Requirement',
                          'Inventory':'Average of Inventory'})
         .stack()
         .unstack(2)
         .reset_index()
         .rename(columns={'level_2':'Values'}))

print (df1)
Month Type Product                Values  18M01  18M02
0        A  ABC001    Sum of Requirement    1.0    3.0
1        A  ABC001  Average of Inventory    3.0    3.0
2        A  ABC002    Sum of Requirement    2.0    NaN
3        A  ABC002  Average of Inventory    4.0    NaN
4        B  ABC001    Sum of Requirement    4.0    NaN
5        B  ABC001  Average of Inventory    3.0    NaN
6        B  ABC002    Sum of Requirement    NaN   11.0
7        B  ABC002  Average of Inventory    NaN    4.0

Upvotes: 1

Related Questions