syrup
syrup

Reputation: 1015

Pandas pivot table with unique totals

I have this dataframe of transactions, each transaction has a customer, who has a credit card and each customer may have up to 3 credit cards (bronze, silver and gold), some customers bough with different credit cards on the same month, and other customers bought with the same credit card on different months.

CUST_CODE   CATEGORY    MONTH   PURCHASED
4111        BRONZE      JAN     1
6326        SILVER      FEB     1
4666        GOLD        JAN     1
8062        BRONZE      JAN     1
                        FEB     1
            SILVER      JAN     1
                        FEB     1
9393        BRONZE      JAN     1
1021        SILVER      FEB     1
                        JAN     1
9244        BRONZE      JAN     1
7963        SILVER      FEB     1
            GOLD        JAN     1
                        FEB     1
2962        BRONZE      JAN     1
2893        SILVER      FEB     1

I want to create a pivot table, but I need the total of rows to count the customer just once (even if he has several cards), at the same time, the total of columns should count the customer just once (even if the customer has bought on several months)

This is a regular pivot table:

df = df.pivot_table(index='CATEGORY', columns='MONTH', values='CUST_CODE',  
                    aggfunc='count', margins=True, margins_name='TOTAL')         

CATEGORY    JAN    FEB     Grand Total
BRONZE      5      1       6
GOLD        2      1       3
SILVER      2      5       7
TOTAL       9      7       16

But this is the output I really want, note that the total of both rows and columns equals the total amount of unique customers (10):

CATEGORY    JAN    FEB     Grand Total
BRONZE      5      1       5
GOLD        2      1       1
SILVER      2      5       4
TOTAL       6      4       10

How do I get there?

Upvotes: 1

Views: 2479

Answers (1)

syrup
syrup

Reputation: 1015

This is what I needed:

func = lambda x: x.nunique()
df = df.pivot_table(index='CATEGORY', columns='MONTH', values='CUST_CODE',  
                    aggfunc=func, margins=True, margins_name='TOTAL')  

Upvotes: 2

Related Questions