Reputation: 1
In BigQuery, I have a data set from Google Analytics on hit level containing, amongst others: date
, fullVisitorId
, visitId
, transactionRevenue
, and pageHome
(whether or not the page someone is on is the homepage or not). For each visitor, I want to calculate the running total of pageHome
until a transaction is made. If a transaction is made, then transactionRevenue
is a number, otherwise it's null. For an example, please see the attached picture.
I have tried a sum over the partition by fullVisitorId
and ordering by visitId
. However, I'm not sure how to reset when transactionId
is not null.
SELECT
date,
fullVisitorId,
visitId,
visitNumber,
transactionRevenue,
pageHome,
SUM(pageHome) OVER(PARTITION BY fullVisitorId ORDER BY visitId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS sumpageHome
This code calculates the running total per visitor, but it does not reset yet after a transaction is made.
Upvotes: 0
Views: 1217
Reputation: 1270873
For each visitor, I want to calculate the running total of pageHome until a transaction is made.
Use a subquery to calculate the transaction number. Then use that in a cumulative sum:
SELECT SUM(pageHome) OVER (PARTITION BY fullVisitorId, transaction_count ORDER BY DATE) as running_pageHome
FROM (SELECT t.*,
COUNT(TransactionRevenue) OVER (PARTITION BY fullVisitorId ORDER BY DATE) as transaction_count
FROM t
) t
Upvotes: 2