Reputation: 1015
I have this dataframe of transactions, each transaction has a customer, who has a credit card and each customer may have up to 3 credit cards (bronze, silver and gold), some customers bough with different credit cards on the same month, and other customers bought with the same credit card on different months.
CUST_CODE CATEGORY MONTH PURCHASED
4111 BRONZE JAN 1
6326 SILVER FEB 1
4666 GOLD JAN 1
8062 BRONZE JAN 1
FEB 1
SILVER JAN 1
FEB 1
9393 BRONZE JAN 1
1021 SILVER FEB 1
JAN 1
9244 BRONZE JAN 1
7963 SILVER FEB 1
GOLD JAN 1
FEB 1
2962 BRONZE JAN 1
2893 SILVER FEB 1
I want to create a pivot table, but I need the total of rows to count the customer just once (even if he has several cards), at the same time, the total of columns should count the customer just once (even if the customer has bought on several months)
This is a regular pivot table:
df = df.pivot_table(index='CATEGORY', columns='MONTH', values='CUST_CODE',
aggfunc='count', margins=True, margins_name='TOTAL')
CATEGORY JAN FEB Grand Total
BRONZE 5 1 6
GOLD 2 1 3
SILVER 2 5 7
TOTAL 9 7 16
But this is the output I really want, note that the total of both rows and columns equals the total amount of unique customers (10):
CATEGORY JAN FEB Grand Total
BRONZE 5 1 5
GOLD 2 1 1
SILVER 2 5 4
TOTAL 6 4 10
How do I get there?
Upvotes: 1
Views: 2479
Reputation: 1015
This is what I needed:
func = lambda x: x.nunique()
df = df.pivot_table(index='CATEGORY', columns='MONTH', values='CUST_CODE',
aggfunc=func, margins=True, margins_name='TOTAL')
Upvotes: 2