Reputation: 1841
I have a situation where multiple customer ids can be belong to the same account, and all records are grouped in to one of two groups:
CUSTOMER_ACCOUNT_ID CUSTOMER_ID GROUP
123 555 A
123 556 A
124 557 B
124 558 B
125 559 A
What I want to do is get the count of unique CUSTOMER_ACCOUNT_IDs belonging to each group. That is, I don't care how many customer_ids belong to an account, I just want to see how many accounts is in each group. I'm looking for this output
GROUP COUNT
A 2
B 1
That is, number of unique accounts by group. One way of thinking of it is that I want to collapse or remove the CUSTOMER_ID dimension, so I'm left with
CUSTOMER_ACCOUNT_ID GROUP
123 A
124 B
125 A
and then do a value count of GROUP, but I'm unsure of how to approach this. I did find an ugly way to do this, but I'm new to pandas from using R so I'm guessing there's a more straightforward way that I don't know yet...
Upvotes: 2
Views: 1242
Reputation: 16619
Is this what you want to achieve?
In [103]: df.drop_duplicates(['CUSTOMER_ACCOUNT_ID', 'GROUP']).drop('CUSTOMER_ID', 1)
Out[103]:
CUSTOMER_ACCOUNT_ID GROUP
0 123 A
2 124 B
4 125 A
Upvotes: 2
Reputation: 214957
You can use the nunique()
method after the grouping:
df.groupby('GROUP')['CUSTOMER_ACCOUNT_ID'].nunique().reset_index()
# GROUP CUSTOMER_ACCOUNT_ID
# 0 A 2
# 1 B 1
Upvotes: 3