user603535
user603535

Reputation: 63

Sort Pandas Groupby by a particular column

Data (I've incorporated some extra steps because I receive the data in a particular form):

    import numpy as np
    import pandas as pd

    d1 = pd.DataFrame({"Date" : ['1/1/2022', '12/15/2010', '6/1/2015', '1/31/2022', '12/31/2010', '3/10/2009', '1/7/2022', '12/9/2010','12/20/2010','1/13/2022'],
               "Expense": ['Food', 'Food', 'Gasoline', 'Coffee', 'Coffee', 'PayPal', 'Gasoline', 'Gasoline','Gasoline','Coffee'],
               "Total": [3.89, 7.00, 11, 0.99, 8.01, 99, 76, 50,48,9]})

    # Change Date column to datetime
    d1['Date'] = pd.to_datetime(d1['Date'])

    # Create MMM-YY column from Date column
    d1['MMM-YY'] = d1['Date'].dt.strftime('%b') + '-' + d1['Date'].dt.strftime('%y')

    # Sort DataFrame by Date
    d1.sort_values('Date', inplace=True)

    d1

        Date        Expense   Total  MMM-YY
    5   2009-03-10  PayPal    99.00  Mar-09
    7   2010-12-09  Gasoline  50.00  Dec-10
    1   2010-12-15  Food      7.00   Dec-10
    8   2010-12-20  Gasoline  48.00  Dec-10
    4   2010-12-31  Coffee    8.01   Dec-10
    2   2015-06-01  Gasoline  11.00  Jun-15
    0   2022-01-01  Food      3.89   Jan-22
    6   2022-01-07  Gasoline  76.00  Jan-22
    9   2022-01-13  Coffee    9.00   Jan-22
    3   2022-01-31  Coffee    0.99   Jan-22

I want to sum the Total column for every expense type within every month (entry in MMM-YY). Here's the important part: I want to keep the MMM-YY column in increasing order (just like d1 DataFrame), but I want the Expense column to be sorted alphabetically.

Here is the desired output after applying groupby:

    MMM-YY  Expense 
    Mar-09  PayPal      99.00
    Dec-10  Coffee       8.01
            Food         7.00
            Gasoline    98.00
    Jun-15  Gasoline    11.00
    Jan-22  Coffee       9.99
            Food         3.89
            Gasoline    76.00

Notice how the MMM-YY column remains in ascending order, but the expense column is organized alphabetically within each group with multiple rows.

Thank you!

Upvotes: 0

Views: 53

Answers (3)

PTQuoc
PTQuoc

Reputation: 1063

What I often do is to get the result in dataframe then set_index in multilevel after. This is easier to manipulate

import datetime as dt

# Create year and month
d1['year'] = d1['Date'].dt.year
d1['month'] = d1['Date'].dt.month

# Sort then groupby
d2.sort_values(by=['year', 'month', 'Expense'], ascending=[True, True, True] ignore_index=True, inplace=True)
d2 = d1.groupby(['year', 'month', 'Expense'])['Total'].sum().reset_index(name='Sum of Total')

# Change back to multilevel index as your desired output
d2.set_index(['year', 'month', 'Expense'], inplace=True)

Hope this help

Upvotes: 0

mozway
mozway

Reputation: 260380

I would sort before aggregation, simplest IMO is to use a monthly period:

(d1.assign(m=d1['Date'].dt.to_period('M'))
   .sort_values(by=['m', 'Expense'])
   .groupby(['MMM-YY', 'Expense'], sort=False)['Total'].sum()
 )

Output:

MMM-YY  Expense 
Mar-09  PayPal      99.00
Dec-10  Coffee       8.01
        Food         7.00
        Gasoline    98.00
Jun-15  Gasoline    11.00
Jan-22  Coffee       9.99
        Food         3.89
        Gasoline    76.00
Name: Total, dtype: float64

Upvotes: 1

Rabinzel
Rabinzel

Reputation: 7903

Couldn't manage to do it in only one groupby. But this is how I get to your desired result:

out = d1.groupby(['MMM-YY', 'Expense'], sort=False)['Total'].sum()
out.groupby(level=0, sort=False, group_keys=False).apply(lambda x: x.sort_index(level='Expense'))
MMM-YY  Expense 
Mar-09  PayPal      99.00
Dec-10  Coffee       8.01
        Food         7.00
        Gasoline    98.00
Jun-15  Gasoline    11.00
Jan-22  Coffee       9.99
        Food         3.89
        Gasoline    76.00
Name: Total, dtype: float64

Upvotes: 1

Related Questions