RJL
RJL

Reputation: 13

How can I assign chronological order position or each customers purchase history using Python

I have a data set of all customers purchases made in the last 5 years with the below columns as an example.

CUSTOMER ID | ORDER NUMBER | ORDER DATE (DateTime) | ITEM ID

What I need to do is assign each individual customer order position (i.e 1,2,3,4) - So for each customer I am able to know which order was their 1st 2nd or 3rd using the above criteria

Things to consider:

  1. There are multiple Customer ID's and Order Numbers in the same table
  2. There are multiple rows from the same customer & order number combination as each order has multiple items, so if the first order for a customer has 3 items on it I want all 3 items to show as 1.

Struggling to find a starting point of how to do this

Below is the data table:

import pandas as pd
df = pd.DataFrame({'Customer ID' : ['C100003','C100002','C100002','C100002','C100003','C100002'],
                    'Order Number' : ['RJSJ0053','RJSJ0060','RJSJ0085','RJSJ0085','RJSJ0089','RJSJ0092'],
                    'Order Date' : ['2023-05-08','2023-06-09','2023-06-13','2023-06-13','2023-06-13','2023-06-14'],
                    'Item ID' : ['Mouse','Keyboard','Computer','Monitor','Keyboard','Headset'],
                    'Order Position' : [1,1,2,2,2,3]})

Upvotes: 1

Views: 128

Answers (1)

Stu Sztukowski
Stu Sztukowski

Reputation: 12909

First, remove all duplicate [Customer ID, Order Number]. This will get you a table that looks like this:

  Customer ID Order Number  Order Date   Item ID
0     C100003     RJSJ0053  2023-05-08     Mouse
1     C100002     RJSJ0060  2023-06-09  Keyboard
2     C100002     RJSJ0085  2023-06-13  Computer
4     C100003     RJSJ0089  2023-06-13  Keyboard
5     C100002     RJSJ0092  2023-06-14   Headset

Now we can sort this by [Customer ID, Order Date], create a cumulative count grouped by Customer ID, then fill forward the missing values since those are multiple items in the same order.

df['Order Position'] = (df.drop_duplicates(['Customer ID', 'Order Number'])
                          .sort_values(['Customer ID', 'Order Date'])
                          .groupby('Customer ID')
                          .cumcount()+1
                       )

df['Order Position'] = (df.sort_values(['Customer ID', 'Order Date'])['Order Position']
                          .fillna(method='ffill')
                          .astype(int)
                       )
  Customer ID Order Number  Order Date   Item ID  Order Position
0     C100003     RJSJ0053  2023-05-08     Mouse               1
1     C100002     RJSJ0060  2023-06-09  Keyboard               1
2     C100002     RJSJ0085  2023-06-13  Computer               2
3     C100002     RJSJ0085  2023-06-13   Monitor               2
4     C100003     RJSJ0089  2023-06-13  Keyboard               2
5     C100002     RJSJ0092  2023-06-14   Headset               3

Upvotes: 1

Related Questions