Reputation: 21
I have a data frame that looks like this:
month organization customer_id
01 ABC 001
01 ABC 001
01 ABC 002
02 XYZ 003
03 XYZ 002
06 XYZ 006
04 XYZ 007
03 DEF 004
04 DEF 005
04 MNO 004
Using pandas, I would like to group by each organization and distinct count the number of customers unique to each organization (if a customer visited another organization then the customer is not unique).
Desired output
organization unique_customer_count
ABC 1
XYZ 3
DEF 1
How can I achieve this using pandas? I would really appreciate the help.
Upvotes: 0
Views: 158
Reputation: 862681
First remove duplicates per both columns by first DataFrame.drop_duplicates
, then remove duplicates per customer_id
and last count unique values per organization
:
df1 = (df.drop_duplicates(['organization','customer_id'])
.drop_duplicates(['customer_id'], keep=False)
.groupby('organization', sort=False)
.size()
.reset_index(name='unique_customer_count'))
print (df1)
organization unique_customer_count
0 ABC 1
1 XYZ 3
2 DEF 1
Upvotes: 2