Tomas Conticello
Tomas Conticello

Reputation: 63

ROW_NUMBER() WITH CONDITION BIGQUERY

I would really appreciate some help with this. I have a set of data with purchases of tours. Each tour has a Purchaser_Email and Event_Date and more other columns not relevant. I want have a column trip to identify if that event is a new trip or is the same trip. To identify a new purchase as a new trip the difference between two Event_Dates has to be more than 30 Days. If not that tour is consider as the same trip. At the end what I need is to know how many trip the customer did and group the purchases by Trip. I make a query with ROW_NUMBER() and to calculate the date_diff between the first purchase and the next purchase. I feel that I am very close, but I need some help to add the Trip Column.

I need something like this: Desired Colum

In this File is the sample data set and the column I need: https://docs.google.com/spreadsheets/d/1ToNFQ9l2-ztDrN2zSlKlgBQk95vO6BnRv6VabWrHBmM/edit?usp=sharing The RAW data is the first Tab, The result of the query Below in the Second Tab with the columns in Orange, And the last column in Red is the column I am looking for.

WITH NumberedDates AS (
SELECT
City
,Booking
,Purchase_Date
, Purchaser_Email
,Guest_Info
,Addr_1
,City_7
,State_Province
,Country
, Gross_Sales
, Event_Date
, Event_Name
, MIN(Event_Date) OVER (PARTITION BY Purchaser_Email) as minPurchDate
, ROW_NUMBER() OVER (PARTITION BY Purchaser_Email ORDER BY Event_Date) AS RowNo
FROM SalesEatingEurope.DymTable )



SELECT
n1.City
, n1.Booking
, n1.Purchase_Date
, n1.Purchaser_Email
, n1.Guest_Info
, n1.Addr_1
, n1.City_7
, n1.State_Province
, n1.Country
, n1.Gross_Sales
, n1.Event_Name
, n1.Event_Date
, n1.RowNo as TransactionNumber
, n2.Event_Date as PrevEventDate
, IFNULL(date_diff(EXTRACT(DATE FROM n2.Event_Date), EXTRACT(DATE FROM n1.Event_Date) ,day), 0)*-1 AS DaysSincePrevEvent
, n1.minPurchDate as FirstEvent
, IFNULL(date_diff( EXTRACT(DATE FROM n1.minPurchDate), EXTRACT(DATE FROM n1.Event_Date) ,day), 0)*-1 AS DaysSinceFirstEvent
FROM NumberedDates  AS n1
LEFT JOIN NumberedDates  AS n2
ON n1.Purchaser_Email = n2.Purchaser_Email
AND n1.RowNo = n2.RowNo + 1
ORDER BY n1.Purchaser_Email, n1.Event_Date

Upvotes: 1

Views: 9953

Answers (1)

saifuddin778
saifuddin778

Reputation: 7277

You are going about right. After partitioning and assigning row_number() or rank(), you can assign a boolean parameter based on the condition of two purchases lagging with certain delta.

Here is a way to achieve this:

with data as (
  select purchaser_email, event_date, rank() over (partition by purchaser_email order by event_date) as indx from (
    select '[email protected]' as purchaser_email, date('2018-10-15') as event_date union all
    select '[email protected]' as purchaser_email, date('2018-10-12') as event_date union all
    select '[email protected]' as purchaser_email, date('2018-10-19') as event_date union all
    select '[email protected]' as purchaser_email, date('2018-10-03') as event_date union all
    select '[email protected]' as purchaser_email, date('2018-10-10') as event_date union all
    select '[email protected]' as purchaser_email, date('2018-11-26') as event_date union all
    select '[email protected]' as purchaser_email, date('2018-11-28') as event_date union all
    select '[email protected]' as purchaser_email, date('2018-12-30') as event_date union all
    select '[email protected]' as purchaser_email, date('2018-12-31') as event_date
  )
)
select purchaser_email, count(1) as order_count from (
  select purchaser_email, 
    d1, new_purchase, sum(case when new_purchase=true then 1 else 0 end) over (partition by purchaser_email order by d1) as purchase_count from (
    select 
      t1.purchaser_email, 
      t1.event_date as d1, 
      t2.event_date as d2, 
      t1.indx as t1i,
      t2.indx as t2i,
      case 
        when t2.event_date is null then true 
        when abs(date_diff(t1.event_date, t2.event_date, day)) >= 30 then true 
        else false end as new_purchase
      from data t1
      left join data t2 on t1.purchaser_email = t2.purchaser_email and t1.indx-1 = t2.indx
  )
  order by 1,2,3
)
where new_purchase = true
group by 1
order by 1

Upvotes: 3

Related Questions