Hassan
Hassan

Reputation: 41

Aggregate a Data Frame Python

I have a data frame like this:

col prev_col prev_table prev_tol table count

C_ShipPostalCode C_PostalCode T_customers 0 T_orders 2

C_ShipPostalCode C_PostalCode T_customers 2 T_orders 2

C_ShipPostalCode C_PostalCode T_customers 0 T_orders 1

C_ShipPostalCode C_PostalCode T_customers 0 T_orders 1

C_ShipPostalCode C_PostalCode T_customers 0 T_orders 2

C_ShipPostalCode C_PostalCode T_customers 0 T_orders 1

C_ShipPostalCode C_PostalCode T_customers 1 T_orders 1

C_SupplierID C_UnitPrice T_products 1 T_suppliers 3

C_SupplierID C_UnitPrice T_products 2 T_suppliers 2

I want to turn this dataframe to something like below:

col prev_col prev_table table total prev_tol count

C_ShipPostalCode C_PostalCode T_customers T_orders 6 3 10

C_SupplierID C_UnitPrice T_products T_suppliers 2 3 5

As you can see I want to group by 4 columns but also want to add the prev_tol and count from the original dataframe.

Upvotes: 1

Views: 520

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Let's try:

df_out = df.groupby(['col','prev_col','prev_table']).agg({'prev_tol':'sum','table':['count','max'],'count':'sum'}).reset_index()
df_out.columns = df_out.columns.map('_'.join)
print(df_out)

Output:

               col_     prev_col_  prev_table_  prev_tol_sum  count_sum  \
0  C_ShipPostalCode  C_PostalCode  T_customers             3         10   
1      C_SupplierID   C_UnitPrice   T_products             3          5   

   table_count    table_max  
0            7     T_orders  
1            2  T_suppliers 

Upvotes: 1

Related Questions