How to create a counter column that counts (inner)groups within an (outer)group, resetting after each (outer)group in python

Question

Working with sales data, each row represents a product that has been sold. Each order can consist of either a single product (one row) or multiple products (multiple rows). Each customer could have placed multiple orders throughout the dataset. I'm trying to implement a counter column where each new order would mean +1 to the counter, each product within that order should get the same counter value. With each customer the counter should start over.

html snippet of what the outcome should be because I'm not allowed to post screenshot:



Customer_ID Order_ID Date Product_ID Counter

 56HS3F 3456HJ 16-04-2019 Product A 1
 56HS3F 3456HJ 16-04-2019 Product C 1
 56HS3F 1234QQ 25-05-2019 Product A 2
 56HS3F 3333HI 26-05-2019 Product B 3
 32AS88 1111SZ 20-12-2018 Product B 1
 32AS88 1111SZ 20-12-2018 Product A 1
 32AS88 2234KL 20-12-2018 Product C 2
 678HJI 6786ER 21-09-2019 Product C 1

I have formed groups based on two categories: Customer_ID and Order_ID

I've tried working with ngroup() but this seems to ignore the outer group 'Customer_ID' and counts over the whole data frame looking for similar 'Order_ID's only. I've also tried with .cumcount() but this does respect my grouping and iterates within the nested 'Order_ID' group, but I want it to count over each Order_ID not within.

data['Counter'] = data.groupby(['Customer_ID', 'Order_ID']).ngroup()

data['Counter'] = data.groupby(['Customer_ID', 'Order_ID']).cumcount()

Especially with .ngroup() i expected it to respect my group-within-group structure but it seems to disregard my 'Customer_ID' grouping.

Update: Found the Answer
I found my answer! I created a tracker to see if the Order_ID changed within each Customer_ID. Then I could use .cumsum(), grouping for Customer_ID, on 'Order_Change' to count the 'True' values.

data['Order_Change'] = (data.Order_ID!=df.Order_ID.shift()) | (df.Customer_ID!=df.Customer_ID.shift())

data['Counter'] = df.groupby('Customer_ID')['Order_Change'].cumsum()

How to create a counter column that counts (inner)groups within an (outer)group, resetting after each (outer)group in python

Answers (1)

Related Questions