Timo Rietveld
Timo Rietveld

Reputation: 541

BigQuery: identify all hits preceding a transaction

Suppose the following BQ-query:

SELECT * FROM (
SELECT CONCAT(CAST(fullVisitorId AS String), CAST(visitId AS STRING)) AS CustVisitId, hits.hitnumber AS HitNr, hits.transaction.transactionRevenue/1000000 AS Rev
FROM
  `[projectid].[dataset].ga_sessions_*` AS t, t.hits AS hits
WHERE
  _TABLE_SUFFIX BETWEEN '20190120' AND '20190120'
ORDER BY HitNr)
WHERE CustVisitId = '49889154628941922861547956449'

This produces a table like this:

CustVisitId HitNr   Rev
123456789   1   
123456789   2   
123456789   3   
123456789   4       8
123456789   5   
123456789   6   
123456789   7   
123456789   8   
123456789   9       40
123456789   10  

Now, I'd like to add a column ('New' in example below) to identify all hits preceding (and including) a transaction, like this:

CustVisitId HitNr   Rev   New
123456789   1             1
123456789   2             1
123456789   3             1
123456789   4       8     1
123456789   5             2
123456789   6             2
123456789   7             2
123456789   8             2
123456789   9       40    2
123456789   10            3

Does anyone know how to adjust the query so that the required table is produced?

Upvotes: 1

Views: 591

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below is for BigQuery Standard SQL

#standardSQL
SELECT *,
  1 + (COUNT(Rev) OVER(PARTITION BY CustVisitId ORDER BY HitNr ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)) `New`
FROM (
SELECT CONCAT(CAST(fullVisitorId AS STRING), CAST(visitId AS STRING)) AS CustVisitId, hits.hitnumber AS HitNr, hits.transaction.transactionRevenue/1000000 AS Rev
FROM
  `[projectid].[dataset].ga_sessions_*` AS t, t.hits AS hits
WHERE
  _TABLE_SUFFIX BETWEEN '20190120' AND '20190120'
ORDER BY HitNr)
WHERE CustVisitId = '49889154628941922861547956449'      

As you can see I left your query unchanged and just added one line for New

Upvotes: 2

Related Questions