L Xandor
L Xandor

Reputation: 1841

Pandas Python - get value counts by grouped values

I have a situation where multiple customer ids can be belong to the same account, and all records are grouped in to one of two groups:

CUSTOMER_ACCOUNT_ID     CUSTOMER_ID GROUP
123                          555    A
123                          556    A
124                          557    B
124                          558    B
125                          559    A

What I want to do is get the count of unique CUSTOMER_ACCOUNT_IDs belonging to each group. That is, I don't care how many customer_ids belong to an account, I just want to see how many accounts is in each group. I'm looking for this output

GROUP   COUNT
A        2
B        1

That is, number of unique accounts by group. One way of thinking of it is that I want to collapse or remove the CUSTOMER_ID dimension, so I'm left with

CUSTOMER_ACCOUNT_ID     GROUP
123                       A
124                       B
125                       A

and then do a value count of GROUP, but I'm unsure of how to approach this. I did find an ugly way to do this, but I'm new to pandas from using R so I'm guessing there's a more straightforward way that I don't know yet...

Upvotes: 2

Views: 1242

Answers (2)

Nehal J Wani
Nehal J Wani

Reputation: 16619

Is this what you want to achieve?

In [103]: df.drop_duplicates(['CUSTOMER_ACCOUNT_ID', 'GROUP']).drop('CUSTOMER_ID', 1)
Out[103]: 
   CUSTOMER_ACCOUNT_ID GROUP
0                  123     A
2                  124     B
4                  125     A

Upvotes: 2

akuiper
akuiper

Reputation: 214957

You can use the nunique() method after the grouping:

df.groupby('GROUP')['CUSTOMER_ACCOUNT_ID'].nunique().reset_index()

# GROUP CUSTOMER_ACCOUNT_ID
# 0   A                   2
# 1   B                   1

Upvotes: 3

Related Questions