How to create pivot table with two different aggregations

Question

I have a dataset on which I would like to run multiple aggregation steps using. This code creates the data:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Name': ['A', 'A', 'B', 'B'],
                    'S': [200, 100, 300, 400],
                    'Date': pd.to_datetime(['2019-01-01', '2019-01-01', '2019-02-01', '2019-03-01']).date,
                    'Value': [5, 10, 30, 40]})

yielding:

df1: 
  Name    S        Date  Value
0    A  200  2019-01-01      5
1    A  100  2019-01-01     10
2    B  300  2019-02-01     30
3    B  400  2019-03-01     40

The final result of the aggregations should look like this:

                2019-01-01  2019-02-01  2019-03-01
A   100, 200            15      
B   300 - 400                       30          40

The first step I did was

df2 = df.groupby(by=['Name', 'Date']).agg({'S': lambda x: ', '
                             .join(pd.DataFrame([str(s) for s in x]).drop_duplicates()
                                                                    .sort_values(by=0)
                                                                    .iloc[:, 0]
                                                                    .map(str)),
                            'Value': np.sum,})

The .join(...) part is a bit convoluted but takes the numbers in S, drops duplicates, sorts, and concatenates them to a string.

The result is this:

df2: 
                        S  Value
Name Date                       
A    2019-01-01  100, 200     15
B    2019-02-01       300     30
     2019-03-01       400     40

and now I am stuck. I can generate the following:

df3 = (df2.pivot_table('Value', index=['Name', 'S'], columns=['Date'], 
                      aggfunc={'Value': np.sum})
                    .fillna(0)
                    .reset_index()
                    )

df3: 
Date Name         S  2019-01-01  2019-02-01  2019-03-01
0       A  100, 200        15.0         0.0         0.0
1       B       300         0.0        30.0         0.0
2       B       400         0.0         0.0        40.0

However, I would like the two last lines to be combined, with S becoming 300 - 400 (similar to the join for df2). I have not found out how I can combine those aggregations into one step (mixing groupby and pivot_table).

Thanks for the help.

Parth · Accepted Answer

Try this:

val=df1.groupby(['Name','Date'])['Value'].sum().reset_index() # get aggregate sum of values
ind=df1.groupby('Name').apply(lambda x: '-'.join([str(i) for i in x.S.values])).reset_index() # Prepare index for target dataframe
target_df=ind.merge(val, on=['Name']).pivot_table(index=['Name', 0], columns=['Date'], values='Value').fillna(0) # Merge both and pivot to get desired output

Then, print(target_df) gives desired output:

Date          2019-01-01  2019-02-01  2019-03-01
Name 0                                          
A    200-100        15.0         0.0         0.0
B    300-400         0.0        30.0        40.0

How to create pivot table with two different aggregations

Answers (2)

Related Questions