Reputation: 3502
I would like to get the distinct count of products per order_number. I managed to get the total_product count (thanks to the help of another SO user), but I can't figure out the distinct count.
This is what I have:
data['total_productcount'] = data.groupby(['order_number'])['order_number'].transform('size')
And it gives:
order_number product_id total_productcount
171-1046037-0511522 4260179734731 5
171-1046037-0511522 4054673034394 5
171-1046037-0511522 4054673001235 5
171-1046037-0511522 4054673005752 5
171-1046037-0511522 5011385960075 5
171-1046037-0511522 5011385960075 5
This is the dataframe, that I would like to generate (including: distict_productcount)
order_number product_id total_productcount distict_productcount
171-1046037-0511522 4260179734731 5 1
171-1046037-0511522 4054673034394 5 1
171-1046037-0511522 4054673001235 5 1
171-1046037-0511522 4054673005752 5 1
171-1046037-0511522 5011385960075 5 1
171-1046037-0511522 5011385960075 5 2
How can I generate "distict_productcount" ?
Upvotes: 3
Views: 3814
Reputation: 294586
data.groupby('order_number').product_id.nunique()
You can get a new column by either using transform
or join
via transform
s = data.groupby('order_number').product_id.transform('nunique')
df = df.assign(distinct_productcount=s)
via join
s = data.groupby('order_number').product_id.nunique()
df = df.join(s.rename('distinct_productcount'), on='order_number')
Upvotes: 4