Alejandro L
Alejandro L

Reputation: 139

How can I group by category and sum total sales by month - pandas

I have the following DataFrame:

test = {'Date': ['2021-01-01', '2021-01-15', '2021-01-02', '2021-01-14', '2021-01-05', '2021-01-07'],
        'Category': ['Fixed', 'Fixed', 'Mindful Spending', 'Mindful Spending', 'Subscription', 'Subscription'],
       'Amount': ['5', '5', '5', '5', '5', '5']}
example = pd.DataFrame(test)
example

enter image description here

My Date column data type is datetime64[ns], transformed using pd.to_datetime. I want to groupby Category and sum total amount spent by Date month. The result that I want is something like this:

test = {'Date': ['2021-01', '2021-01', '2021-01'], 
        'Category': ['Fixed', 'Mindful Spending', 'Subscription'],
       'Amount': ['10', '10', '10']}
result = pd.DataFrame(test)
result

enter image description here

How can I do this?

Thank you for all the help!

Upvotes: 2

Views: 5560

Answers (2)

BramV
BramV

Reputation: 556

The way I prefer to approach this is by creating one additional column on which you then groupby.

import pandas as pd
test = {'Date': ['2021-01-01', '2021-01-15', '2021-01-02', '2021-01-14', '2021-01-05', '2021-01-07'],
    'Category': ['Fixed', 'Fixed', 'Mindful Spending', 'Mindful Spending', 'Subscription', 'Subscription'],
   'Amount': [5, 5, 5, 5, 5, 5]}
example = pd.DataFrame(test)

# convert to datetime column
example['Date'] = pd.to_datetime(example['Date']) 

# Extract formatted string on which to groupby
example['date_month'] = example['Date'].dt.strftime('%Y-%m')

example.groupby(['date_month', 'Category'], as_index=False)['Amount'].sum()

The result then is exactly as requested: enter image description here

Upvotes: 4

mcsoini
mcsoini

Reputation: 6642

You can use pd.Grouper to define months to group by. Note that for this purpose the Date column needs to be your index.

test = {'Date': ['2021-01-01', '2021-01-15', '2021-01-02', '2021-01-14', '2021-01-05', '2021-01-07'],
        'Category': ['Fixed', 'Fixed', 'Mindful Spending', 'Mindful Spending', 'Subscription', 'Subscription'],
       'Amount': [5, 5, 5, 5, 5, 5]}
example = pd.DataFrame(test)
example.Date = pd.DatetimeIndex(example.Date)

example.set_index("Date").groupby([pd.Grouper(freq="M"), "Category"]).sum().reset_index()


Out[2]: 
        Date          Category  Amount
0 2021-01-31             Fixed      10
1 2021-01-31  Mindful Spending      10
2 2021-01-31      Subscription      10

Upvotes: 8

Related Questions