HossBender
HossBender

Reputation: 1079

Determining Where Date Ranks | BigQuery

I have purchase data that I'm trying to analyze. I'm attempting to find the 2nd and 3rd purchases (and potentially 4, 5, etc.) for each user.

The data I have currently looks like this:

email, first_purchase_date, processed_date, sku, first_order
[email protected]   6/1/2019   6/1/2019   HG1555   HG1555
[email protected]   6/1/2019   8/1/2019   RF2655   HG1555
[email protected]   6/1/2019   8/1/2019   FP7789   HG1555
[email protected]   6/1/2019  11/1/2019   RF2655   HG1555
[email protected]   7/1/2019   7/1/2019   RF2655   RF2655
[email protected]   7/1/2019   8/1/2019   HG1555   RF2655
[email protected]   7/1/2019   8/1/2019   FP7789   RF2655
[email protected]   9/1/2019   6/1/2019   HG1555   HG1555
[email protected]   9/1/2019  11/1/2019   FP7789   HG1555

The first order column is looking at what was in the first basket that the customer purchased. Is there a way I can create another column that will tell me exactly what purchase number each line item is? Ideally, the table will look like this:

email, first_purchase_date, processed_date, sku, first_order, order number
[email protected]   6/1/2019   6/1/2019   HG1555   HG1555   1
[email protected]   6/1/2019   8/1/2019   RF2655   HG1555   2
[email protected]   6/1/2019   8/1/2019   FP7789   HG1555   2
[email protected]   6/1/2019  11/1/2019   RF2655   HG1555   3
[email protected]   7/1/2019   7/1/2019   RF2655   RF2655   1
[email protected]   7/1/2019   8/1/2019   HG1555   RF2655   2
[email protected]   7/1/2019   8/1/2019   FP7789   RF2655   2
[email protected]   9/1/2019   6/1/2019   HG1555   HG1555   1
[email protected]   9/1/2019  11/1/2019   FP7789   HG1555   2

Basically, I want to look at each email and determine where the processed_date falls in respect to all other processed_dates that are registered to that user. The problem I'm trying to work around is when there are multiple line items for a particular processed_date.

Upvotes: 0

Views: 31

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below is for BigQuery Standard SQL and assumes the processed_date column is of date type

#standardSQL
SELECT *,
  DENSE_RANK() OVER(PARTITION BY email ORDER BY processed_date) order_number
FROM `project.dataset.table`
-- ORDER BY email, processed_date   

If processed_date is a string - you can use below then

#standardSQL
SELECT *,
  DENSE_RANK() OVER(PARTITION BY email ORDER BY PARSE_DATE('%d/%m/%Y', processed_date)) order_number
FROM `project.dataset.table`
-- ORDER BY email, PARSE_DATE('%d/%m/%Y', processed_date)

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269443

This sounds like dense_rank():

select t.*,
       dense_rank() over (partition by email order by processed_date) as ranking
from t;

Upvotes: 2

Related Questions