Reputation: 21
Working with sales data, each row represents a product that has been sold. Each order can consist of either a single product (one row) or multiple products (multiple rows). Each customer could have placed multiple orders throughout the dataset. I'm trying to implement a counter column where each new order would mean +1 to the counter, each product within that order should get the same counter value. With each customer the counter should start over.
html snippet of what the outcome should be because I'm not allowed to post screenshot:
<style type="text/css">
table.tableizer-table {
font-size: 12px;
border: 1px solid #CCC;
font-family: Arial, Helvetica, sans-serif;
}
.tableizer-table td {
padding: 4px;
margin: 3px;
border: 1px solid #CCC;
}
.tableizer-table th {
background-color: #104E8B;
color: #FFF;
font-weight: bold;
}
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Customer_ID</th><th>Order_ID</th><th>Date</th><th>Product_ID</th><th>Counter</th></tr></thead><tbody>
<tr><td>56HS3F</td><td>3456HJ</td><td>16-04-2019</td><td>Product A</td><td>1</td></tr>
<tr><td>56HS3F</td><td>3456HJ</td><td>16-04-2019</td><td>Product C</td><td>1</td></tr>
<tr><td>56HS3F</td><td>1234QQ</td><td>25-05-2019</td><td>Product A</td><td>2</td></tr>
<tr><td>56HS3F</td><td>3333HI</td><td>26-05-2019</td><td>Product B</td><td>3</td></tr>
<tr><td>32AS88</td><td>1111SZ</td><td>20-12-2018</td><td>Product B</td><td>1</td></tr>
<tr><td>32AS88</td><td>1111SZ</td><td>20-12-2018</td><td>Product A</td><td>1</td></tr>
<tr><td>32AS88</td><td>2234KL</td><td>20-12-2018</td><td>Product C</td><td>2</td></tr>
<tr><td>678HJI</td><td>6786ER</td><td>21-09-2019</td><td>Product C</td><td>1</td></tr>
</tbody></table>
I have formed groups based on two categories: Customer_ID and Order_ID
I've tried working with ngroup() but this seems to ignore the outer group 'Customer_ID' and counts over the whole data frame looking for similar 'Order_ID's only. I've also tried with .cumcount() but this does respect my grouping and iterates within the nested 'Order_ID' group, but I want it to count over each Order_ID not within.
data['Counter'] = data.groupby(['Customer_ID', 'Order_ID']).ngroup()
data['Counter'] = data.groupby(['Customer_ID', 'Order_ID']).cumcount()
Especially with .ngroup() i expected it to respect my group-within-group structure but it seems to disregard my 'Customer_ID' grouping.
Update: Found the Answer
I found my answer! I created a tracker to see if the Order_ID changed within each Customer_ID. Then I could use .cumsum(), grouping for Customer_ID, on 'Order_Change' to count the 'True' values.
data['Order_Change'] = (data.Order_ID!=df.Order_ID.shift()) | (df.Customer_ID!=df.Customer_ID.shift())
data['Counter'] = df.groupby('Customer_ID')['Order_Change'].cumsum()
Upvotes: 0
Views: 74
Reputation: 21
I found my answer! I created a tracker to see if the Order_ID changed within each Customer_ID. Then I could use .cumsum(), grouping for Customer_ID, on 'Order_Change' to count the 'True' values.
data['Order_Change'] = (data.Order_ID!=df.Order_ID.shift()) | (df.Customer_ID!=df.Customer_ID.shift())
data['Counter'] = df.groupby('Customer_ID')['Order_Change'].cumsum()
Upvotes: 2