worldCurrencies
worldCurrencies

Reputation: 467

How to groupby Month and Year and then sum total in Pandas?

In my dataset, I have 3 columns... I am looking to group by month and year... however i am also looking to group by Name and sum the Price.

Here is a mock data set I've made:

import pandas as pd

# initialise data of lists.
data = {'Name':['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B','B','B', 'C', 'C', 'A', 'C', 'B'], 
'Date': ['06/01/19', '06/11/19', '06/25/19', '06/05/19', '06/02/19', '06/13/19', '06/21/19', '03/09/20', 
'03/17/20', '03/22/20', '06/30/20', '06/22/20', '06/10/20', '07/05/20', '07/25/20', '07/21/20'], 
'Price': [10, 27, 8, 10, 38, 38, 93, 12, 55, 39, 52, 62, 25, 10, 39, 37]}

# Create DataFrame
df = pd.DataFrame(data)

# Print the output.
print(df)
totalSum = df.groupby([df['Date'].dt.year, df['Date'].df.month]).agg({'Price':sum})

The output should look something like:

06/2019
    A   56
    B   120
    C   48
03/2020
    A   12
    B   94
...

etc.

Upvotes: 0

Views: 3043

Answers (2)

Iqigai
Iqigai

Reputation: 421

You were almost there, but for it to work, you needed first to call pandas' to_datetime() method twice to generate the years and the months based on the 'Date' and to use 'Name' as an additional argument for the groupbby call:

totalSum = df.groupby([pd.to_datetime(df['Date']).dt.year,
                       pd.to_datetime(df['Date']).dt.month,
                       'Name']).agg({'Price': sum})
totalSum
Out[17]: 
                Price
Date Date Name       
2019 6    A        56
          B       120
          C        48
2020 3    A        12
          B        94
     6    B        52
          C        87
     7    A        10
          B        37
          C        39

Upvotes: 3

Park
Park

Reputation: 2484

Good try, but the column names are required to groupby, not the value of something like you tried. Thus, I updated df['Date']' form such as '2019-06' and used 'Date' and 'Name' columns for groupby, as follows:

import pandas as pd

# initialise data of lists.
data = {
    'Name': ['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B', 'B', 'B', 'C', 'C', 'A', 'C', 'B'],
    'Date': ['06/01/19', '06/11/19', '06/25/19', '06/05/19', '06/02/19', '06/13/19', '06/21/19', '03/09/20', '03/17/20', '03/22/20', '06/30/20', '06/22/20', '06/10/20', '07/05/20', '07/25/20', '07/21/20'],
    'Price': [10, 27, 8, 10, 38, 38, 93, 12, 55, 39, 52, 62, 25, 10, 39, 37]
}

# Create DataFrame
df = pd.DataFrame(data)

# to make 06/01/19 to 06/01/2019
df['Date'] = df['Date'].apply(lambda x: x[:6]+'20'+x[6:])  

# to remove Day and leave only Year-Month
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y').apply(lambda x: x.strftime('%Y-%m'))  

# Print the output.
totalSum = df.groupby(by=['Date', 'Name']).agg({'Price': sum})
print(totalSum)
#              Price
#Date    Name       
#2019-06 A        56
#        B       120
#        C        48
#2020-03 A        12
#        B        94
#2020-06 B        52
#        C        87
#2020-07 A        10
#        B        37
#        C        39

Upvotes: 0

Related Questions