Reputation: 369
I want to create a feature similar to sequential segmentation provided by Adobe Analytics (Ref) I am using an external BQ table created on top of clickstream data stored in GCS location in the format of parquet, partitioned on date_key and event_name.
Data schema:
date_key, user_id, session_id, event_name
My use case could have been easily solved with the use of MATCH_RECOGNIZE; however, since I am limited to using BigQuery for certain reasons and MATCH_RECOGNIZE is not supported in BigQuery, it has become challenging to find a solution.
I was trying UNNEST, but it is not scaling for larger data Here is the sample query I am using for segment: A then B then C then D, with some operator based conditions in between.
SELECT
session_id,
ROW_NUMBER() OVER (PARTITION BY session_id) as hit_counter,
FROM `table`
WHERE
date_key >= 20241201 and date_key <= 20241231
AND event_name IN ('A', 'B', 'C', 'D')
),
event_mapping AS (
SELECT
session_id,
ARRAY_AGG(STRUCT(event_name, hit_counter)) AS event_sequence
FROM filtered_ds
GROUP BY session_id
),
combinations AS (
SELECT
session_id
FROM event_mapping,
UNNEST(event_sequence) a,
UNNEST(event_sequence) b,
UNNEST(event_sequence) c,
UNNEST(event_sequence) d,
UNNEST(event_sequence) e
WHERE
a.event_name = 'A' AND
b.event_name = 'B' AND
c.event_name = 'C' AND
d.event_name = 'D' AND
e.event_name = 'E' AND
b.hit_counter > a.hit_counter
c.hit_counter - b.hit_counter BETWEEN 0 AND 10 AND
d.hit_counter - c.hit_counter > 15 AND
e.hit_counter > d.hit_counter
)
SELECT COUNT(DISTINCT session_id)
FROM combinations;```
Upvotes: 0
Views: 20