Reputation: 696
I am trying to group by monthtly_purchases
and region
to get the count of customers and sum of monthly spending, however, I get the error below:
Main dataframe:
customer_id monthly_spending month monthtly_purchases region
32324 342 Feb-2019 5 A
34345 293 Feb-2019 5 A
45453 212 Feb-2019 3 A
34343 453 Feb-2019 3 A
53533 112 Feb-2019 5 B
12334 511 Feb-2019 5 B
99934 123 Feb-2019 3 B
21213 534 Feb-2019 3 B
32324 143 March-2019 5 A
34345 453 March-2019 5 A
45453 234 March-2019 3 A
34343 432 March-2019 3 A
53533 124 March-2019 5 B
12334 453 March-2019 5 B
99934 224 March-2019 3 B
21213 634 March-2019 3 B
Output dataframe:
monthly_purchases region monthly_spending count_customers month
5 A 635 2 Feb-2019
3 A 665 2 Feb-2019
5 B 623 2 Feb-2019
3 B 657 2 Feb-2019
5 A 596 2 Feb-2019
3 A 666 2 Feb-2019
5 B 556 2 Feb-2019
3 B 858 2 Feb-2019
This is what I've tried so far but I get the following error:
d = {'customer_id': ['count'], 'monthly_spending': ['sum']}
agg_df = df.groupby('monthtly_purchases', 'region').agg(d)
agg_df
Error msg: No numeric types to aggregate
Upvotes: 1
Views: 217
Reputation: 3971
Remember to put the column names inside a list, when you use group by over 2 or more columns:
import pandas as pd
df = pd.DataFrame([
[32324, 342, "Feb-2019", 5, "A"],
[34345, 293, "Feb-2019", 5, "A"],
[45453, 212, "Feb-2019", 3, "A"],
[34343, 453, "Feb-2019", 3, "A"],
[53533, 112, "Feb-2019", 5, "B"],
[12334, 511, "Feb-2019", 5, "B"],
[99934, 123, "Feb-2019", 3, "B"],
[21213, 534, "Feb-2019", 3, "B"]
],
columns=["customer_id", "monthly_spending", "month", "monthtly_purchases", "region"]
)
d = {'customer_id': ['count'], 'monthly_spending': ['sum']}
agg_df = df.groupby(["monthtly_purchases", "region"]).agg(d)
print(agg_df)
Returns:
customer_id monthly_spending
count sum
monthtly_purchases region
3 A 2 665
B 2 657
5 A 2 635
B 2 623
As requested in the comments, making the multi-index explicit (splitting it into columns by creating a new index):
agg_df.reset_index(inplace=True)
print(agg_df)
Returns:
monthtly_purchases region customer_id monthly_spending
count sum
0 3 A 2 665
1 3 B 2 657
2 5 A 2 635
3 5 B 2 623
Including months as requested in the comments:
agg_df = df.groupby(["month", "monthtly_purchases", "region"], as_index=False).agg(d)
Returns:
month monthtly_purchases region customer_id monthly_spending
count sum
0 Feb-2019 3 A 2 665
1 Feb-2019 3 B 2 657
2 Feb-2019 5 A 2 635
3 Feb-2019 5 B 2 623
4 March-2019 3 A 2 666
5 March-2019 3 B 2 858
6 March-2019 5 A 2 596
7 March-2019 5 B 2 577
Upvotes: 1
Reputation: 35275
The order of the columns is different, but you can get it with the following code.
df = df.groupby(['monthtly_purchases','region','month']).agg({'customer_id': 'size', 'monthly_spending': 'sum'}).reset_index()
df
monthtly_purchases region month customer_id monthly_spending
0 3 A Feb-2019 2 665
1 3 B Feb-2019 2 657
2 5 A Feb-2019 2 635
3 5 B Feb-2019 2 623
Upvotes: 0