Reputation: 1079
I have purchase data that I'm trying to analyze. I'm attempting to find the 2nd and 3rd purchases (and potentially 4, 5, etc.) for each user.
The data I have currently looks like this:
email, first_purchase_date, processed_date, sku, first_order
[email protected] 6/1/2019 6/1/2019 HG1555 HG1555
[email protected] 6/1/2019 8/1/2019 RF2655 HG1555
[email protected] 6/1/2019 8/1/2019 FP7789 HG1555
[email protected] 6/1/2019 11/1/2019 RF2655 HG1555
[email protected] 7/1/2019 7/1/2019 RF2655 RF2655
[email protected] 7/1/2019 8/1/2019 HG1555 RF2655
[email protected] 7/1/2019 8/1/2019 FP7789 RF2655
[email protected] 9/1/2019 6/1/2019 HG1555 HG1555
[email protected] 9/1/2019 11/1/2019 FP7789 HG1555
The first order column is looking at what was in the first basket that the customer purchased. Is there a way I can create another column that will tell me exactly what purchase number each line item is? Ideally, the table will look like this:
email, first_purchase_date, processed_date, sku, first_order, order number
[email protected] 6/1/2019 6/1/2019 HG1555 HG1555 1
[email protected] 6/1/2019 8/1/2019 RF2655 HG1555 2
[email protected] 6/1/2019 8/1/2019 FP7789 HG1555 2
[email protected] 6/1/2019 11/1/2019 RF2655 HG1555 3
[email protected] 7/1/2019 7/1/2019 RF2655 RF2655 1
[email protected] 7/1/2019 8/1/2019 HG1555 RF2655 2
[email protected] 7/1/2019 8/1/2019 FP7789 RF2655 2
[email protected] 9/1/2019 6/1/2019 HG1555 HG1555 1
[email protected] 9/1/2019 11/1/2019 FP7789 HG1555 2
Basically, I want to look at each email and determine where the processed_date falls in respect to all other processed_dates that are registered to that user. The problem I'm trying to work around is when there are multiple line items for a particular processed_date.
Upvotes: 0
Views: 31
Reputation: 172944
Below is for BigQuery Standard SQL and assumes the processed_date column is of date type
#standardSQL
SELECT *,
DENSE_RANK() OVER(PARTITION BY email ORDER BY processed_date) order_number
FROM `project.dataset.table`
-- ORDER BY email, processed_date
If processed_date is a string - you can use below then
#standardSQL
SELECT *,
DENSE_RANK() OVER(PARTITION BY email ORDER BY PARSE_DATE('%d/%m/%Y', processed_date)) order_number
FROM `project.dataset.table`
-- ORDER BY email, PARSE_DATE('%d/%m/%Y', processed_date)
Upvotes: 1
Reputation: 1269443
This sounds like dense_rank()
:
select t.*,
dense_rank() over (partition by email order by processed_date) as ranking
from t;
Upvotes: 2