Reputation: 467
In my dataset, I have 3 columns... I am looking to group by month and year... however i am also looking to group by Name and sum the Price.
Here is a mock data set I've made:
import pandas as pd
# initialise data of lists.
data = {'Name':['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B','B','B', 'C', 'C', 'A', 'C', 'B'],
'Date': ['06/01/19', '06/11/19', '06/25/19', '06/05/19', '06/02/19', '06/13/19', '06/21/19', '03/09/20',
'03/17/20', '03/22/20', '06/30/20', '06/22/20', '06/10/20', '07/05/20', '07/25/20', '07/21/20'],
'Price': [10, 27, 8, 10, 38, 38, 93, 12, 55, 39, 52, 62, 25, 10, 39, 37]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
totalSum = df.groupby([df['Date'].dt.year, df['Date'].df.month]).agg({'Price':sum})
The output should look something like:
06/2019
A 56
B 120
C 48
03/2020
A 12
B 94
...
etc.
Upvotes: 0
Views: 3043
Reputation: 421
You were almost there, but for it to work, you needed first to call pandas' to_datetime()
method twice to generate the years and the months based on the 'Date' and to use 'Name' as an additional argument for the groupbby
call:
totalSum = df.groupby([pd.to_datetime(df['Date']).dt.year,
pd.to_datetime(df['Date']).dt.month,
'Name']).agg({'Price': sum})
totalSum
Out[17]:
Price
Date Date Name
2019 6 A 56
B 120
C 48
2020 3 A 12
B 94
6 B 52
C 87
7 A 10
B 37
C 39
Upvotes: 3
Reputation: 2484
Good try, but the column names are required to groupby, not the value of something like you tried. Thus, I updated df['Date']' form such as '2019-06' and used 'Date' and 'Name' columns for groupby, as follows:
import pandas as pd
# initialise data of lists.
data = {
'Name': ['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B', 'B', 'B', 'C', 'C', 'A', 'C', 'B'],
'Date': ['06/01/19', '06/11/19', '06/25/19', '06/05/19', '06/02/19', '06/13/19', '06/21/19', '03/09/20', '03/17/20', '03/22/20', '06/30/20', '06/22/20', '06/10/20', '07/05/20', '07/25/20', '07/21/20'],
'Price': [10, 27, 8, 10, 38, 38, 93, 12, 55, 39, 52, 62, 25, 10, 39, 37]
}
# Create DataFrame
df = pd.DataFrame(data)
# to make 06/01/19 to 06/01/2019
df['Date'] = df['Date'].apply(lambda x: x[:6]+'20'+x[6:])
# to remove Day and leave only Year-Month
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y').apply(lambda x: x.strftime('%Y-%m'))
# Print the output.
totalSum = df.groupby(by=['Date', 'Name']).agg({'Price': sum})
print(totalSum)
# Price
#Date Name
#2019-06 A 56
# B 120
# C 48
#2020-03 A 12
# B 94
#2020-06 B 52
# C 87
#2020-07 A 10
# B 37
# C 39
Upvotes: 0