user16044878
user16044878

Reputation: 21

Using pandas to select customers unique to each organization

I have a data frame that looks like this:

month organization customer_id
01     ABC          001
01     ABC          001
01     ABC          002
02     XYZ          003
03     XYZ          002
06     XYZ          006
04     XYZ          007
03     DEF          004
04     DEF          005
04     MNO          004

Using pandas, I would like to group by each organization and distinct count the number of customers unique to each organization (if a customer visited another organization then the customer is not unique).

Desired output

organization   unique_customer_count
ABC              1
XYZ              3
DEF              1

How can I achieve this using pandas? I would really appreciate the help.

Upvotes: 0

Views: 158

Answers (1)

jezrael
jezrael

Reputation: 862681

First remove duplicates per both columns by first DataFrame.drop_duplicates, then remove duplicates per customer_id and last count unique values per organization:

df1 = (df.drop_duplicates(['organization','customer_id'])
         .drop_duplicates(['customer_id'], keep=False)
         .groupby('organization', sort=False)
         .size()
         .reset_index(name='unique_customer_count'))
print (df1)
  organization  unique_customer_count
0          ABC                      1
1          XYZ                      3
2          DEF                      1

Upvotes: 2

Related Questions