Reputation: 19
I'm generating a table like this in BigQuery with standard GoogleSQL:
sessionID | hitNumber | ListViewID | Position | Sponsored | isClick | impressions | clicks |
---|---|---|---|---|---|---|---|
aaa20230425 | 1 | aaa20230425-1 | 2 | 1 | false | 1 | 0 |
bbb20230425 | 1 | bbb20230425-1 | 6 | 0 | false | 1 | 0 |
bbb20230425 | 1 | bbb20230425-1 | 7 | 0 | false | 1 | 0 |
bbb20230425 | 1 | bbb20230425-1 | 8 | 0 | false | 1 | 0 |
bbb20230425 | 1 | bbb20230425-1 | 9 | 0 | false | 1 | 0 |
bbb20230425 | 1 | bbb20230425-1 | 10 | 0 | false | 1 | 0 |
bbb20230425 | 1 | bbb20230425-1 | 11 | 0 | false | 1 | 0 |
aaa20230425 | 2 | aaa20230425-2 | 2 | 1 | true | 0 | 1 |
ccc20230425 | 1 | ccc20230425-1 | 0 | 1 | false | 1 | 0 |
ddd20230425 | 17 | ddd20230425-17 | 1 | 1 | false | 1 | 0 |
My code:
SELECT
CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
hits.hitNumber AS hitNumber,
CONCAT( fullvisitorID, CAST(visitId AS STRING), date, hits.hitNumber ) AS ListViewID,
product.productListPosition AS Position,
customDimensions.value AS Sponsored,
(CASE
WHEN product.isClick = TRUE THEN TRUE
ELSE FALSE
END) AS isClick,
SUM(
IF
(product.isImpression, 1,0)) AS impressions,
SUM(
IF
(product.isClick,1,0)) AS clicks
FROM
`Table`,
UNNEST (hits) AS hits,
UNNEST (hits.product) AS product,
UNNEST (product.customDimensions) AS customDimensions
WHERE
product.productListName IN ("searchpage", "categorypage")
AND customDimensions.index = 68
GROUP BY
SessionID, hitNumber, ListViewID, Position,
Sponsored, isClick
I need to create an additional column "parent_ListViewID" which:
Given the example above, I need to have a table like this:
sessionID | hitNumber | ListViewID | Position | Sponsored | isClick | impressions | clicks | parent_ListViewID |
---|---|---|---|---|---|---|---|---|
aaa20230425 | 1 | aaa20230425-1 | 2 | 1 | false | 1 | 0 | aaa20230425-1 |
bbb20230425 | 1 | bbb20230425-1 | 6 | 0 | false | 1 | 0 | bbb20230425-1 |
bbb20230425 | 1 | bbb20230425-1 | 7 | 0 | false | 1 | 0 | bbb20230425-1 |
bbb20230425 | 1 | bbb20230425-1 | 8 | 0 | false | 1 | 0 | bbb20230425-1 |
bbb20230425 | 1 | bbb20230425-1 | 9 | 0 | false | 1 | 0 | bbb20230425-1 |
bbb20230425 | 1 | bbb20230425-1 | 10 | 0 | false | 1 | 0 | bbb20230425-1 |
bbb20230425 | 1 | bbb20230425-1 | 11 | 0 | false | 1 | 0 | bbb20230425-1 |
aaa20230425 | 2 | aaa20230425-2 | 2 | 1 | true | 0 | 1 | aaa20230425-1 |
ccc20230425 | 1 | ccc20230425-1 | 0 | 1 | false | 1 | 0 | ccc20230425-1 |
ddd20230425 | 17 | ddd20230425-17 | 1 | 1 | false | 1 | 0 | ddd20230425-17 |
I've tried a solution with LAG() here but couldn't get it to work: Lag() with condition in sql server
I've also tried a few solutions with last_value, but failed as well - it returns last ListViewID value across all values with the same sessionID, not just from the previous rows.
Upvotes: 0
Views: 218
Reputation: 12234
You can consider below.
-- sample data
WITH sample_table AS (
-- put your sample data here
)
-- query starts here
SELECT *,
IF (
isClick IS TRUE,
LAST_VALUE(IF(isClick IS FALSE, ListViewID, NULL) IGNORE NULLS) OVER w,
ListViewID
) AS parent_ListViewID
FROM sample_table
WINDOW w AS (PARTITION BY sessionID ORDER BY hitNumber);
Query result
hitNumber
is INT64
type.Upvotes: 1