gomeslhlima
gomeslhlima

Reputation: 141

Pandas: Combine aggregated columns with not Aggregated columns in the same function call

I have a DataFrame with two columns that will be used in group by( GroupBy1 and GroupBy2 ), with dozens of columns that will be measures using agg() ( MesA:Max, MesB:Min, MesC:sum.. ) and other columns that aren't for measures, but for informations like datetime of last row in the groupby, 'GroupName1', 'GroupName2', GroupId of another database, and others.

TicketsDBFrame
GroupBy1 GroupBy2 GroupName1 GroupName2 MesA MesB MesC MesD LastTicketTime       GroupId1 GroupId2
1        1        First      First      2    3    1    6    2021-04-05 01:00:00  4        99
1        1        First      First      4    1    3    2    2021-04-05 02:00:00  4        99
1        1        First      First      2    5    2    1    2021-04-05 03:00:00  4        99
1        2        First      Second     2    5    2    1    2021-04-05 01:30:00  4        75
1        2        First      Second     1    4    7    3    2021-04-05 02:30:00  4        75
2        2        Second     Second     4    2    1    8    2021-04-05 02:00:00  2        75
2        2        Second     Second     1    6    3    1    2021-04-05 04:00:00  2        75

The desired output:

GroupBy1 GroupBy2 GroupName1 GroupName2 MesA MesB MesC MesD LastTicketTime       GroupId1 GroupId2
1        1        First      First      4    1    6    9    2021-04-05 03:00:00  4        99
1        2        First      Second     2    4    4    2    2021-04-05 02:30:00  4        75
2        2        Second     Second     4    2    4    9    2021-04-05 04:00:00  2        75

I already know how to create this desired Frame using derived DataFrames, using 'loc' and 'idxmax' to get LastTicketTime in a frame, other derived frame to 'Ids and Names' and another DataFrame to call agg() for the measure columns, after that I do a merge in the frames

groupInfoFrame:Little Derived Frame created associating GroupBy1 - GroupName1 - GroupId1

lastTicketFrame: Frame with only LastTicketTime

lastTicketFrame=TicketsDBFrame[['GroupBy1','GroupBy2','LastTicketTime' ]]
lastTicketFrame=lastTicketFrame.loc[lastTicketFrame.groupby(['GroupBy1'],['GroupBy2]).LastTicketTime.idxmax() ]

measuresFrame: Only measures

measuresFrame = TicketsDBFrame.groupby(['GroupBy1'],['GroupBy2]).agg( mesA:.....MesD )

After all I do a merge in the measureFrame and lastTicketFrame using the GroupBy1 and GroupBy2 as keys

Is it possible to have all these information in only one agg() or transform() or other function call? Without derived frames and merges

Upvotes: 0

Views: 40

Answers (1)

perl
perl

Reputation: 9941

You can do it in a single agg (by sorting values by LastTicketTime and taking last in agg):

(df
    .sort_values('LastTicketTime')
    .groupby(['GroupBy1', 'GroupBy2'], as_index=False)
    .agg({
        'GroupName1': 'last',
        'GroupName2': 'last',
        'MesA': 'max',
        'MesB': 'min',
        'MesC': 'sum',
        'MesD': 'sum',
        'LastTicketTime': 'last',
        'GroupId1': 'last',
        'GroupId2': 'last'
    }))

Output:

   GroupBy1  GroupBy2 GroupName1 GroupName2  MesA  MesB  MesC  MesD  \
0         1         1      First      First     4     1     6     9   
1         1         2      First     Second     2     4     9     4   
2         2         2     Second     Second     4     2     4     9   

        LastTicketTime  GroupId1  GroupId2  
0  2021-04-05 03:00:00         4        99  
1  2021-04-05 02:30:00         4        75  
2  2021-04-05 04:00:00         2        75

P.S. If I'm not mistaken, there seems to be a problem in your expected output for GroupName1 = First, GroupName2 = Second: MesC and MesD are sums, and should be 9 and 4, respectively (instead of 4 and 2).

Upvotes: 1

Related Questions