Reputation: 73
According to Druid doc, the DS_THETA function can work on sketches.
How could the below Druid query be rewritten in ClickHouse using the uniqTheta* variations?
SELECT THETA_SKETCH_ESTIMATE(
THETA_SKETCH_INTERSECT(
DS_THETA(theta_uid) FILTER(WHERE "show" = 'Bridgerton' AND "episode" = 'S1E1'),
DS_THETA(theta_uid) FILTER(WHERE "show" = 'Bridgerton' AND "episode" = 'S1E2')
)
) AS users
FROM ts_tutorial
Here's my attempt to solve it using MV
CREATE TABLE ts_tutorial
(
date_id Date,
uid String,
show String,
episode String
)
ENGINE = MergeTree()
ORDER BY (date_id, show, episode, uid);
CREATE TABLE tutorial_tbl
(
date_id Date,
show String,
episode String,
theta_uid AggregateFunction(uniqTheta, String)
)
ENGINE = AggregatingMergeTree()
ORDER BY (date_id, show, episode);
CREATE MATERIALIZED VIEW IF NOT EXISTS tutorial_mv TO tutorial_tbl
AS
SELECT
date_id,
show,
episode,
uniqThetaState(uid) as theta_uid
FROM ts_tutorial
GROUP BY date_id, show, episode
;
INSERT INTO ts_tutorial VALUES ('2022-05-19','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-19','alice','Game of Thrones','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-19','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-19','bob','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-20','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-20','carol','Bridgerton','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-20','dan','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','carol','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','erin','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','alice','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','bob','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','bob','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','carol','Bridgerton','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-22','bob','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','erin','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','erin','Bridgerton','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-23','erin','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-23','alice','Game of Thrones','S1E1');
And to answer: How many users watched both episodes of Bridgerton?
SELECT finalizeAggregation(
uniqThetaIntersect(
uniqThetaStateIf(theta_uid, show = 'Bridgerton' AND episode = 'S1E1'),
uniqThetaStateIf(theta_uid, show = 'Bridgerton' AND episode = 'S1E2')
)
) AS users
FROM tutorial_mv
The above query would return 0 instead of 1 for user carol. Which doesn't seem to work
Upvotes: 0
Views: 166
Reputation: 1
I know this is an old post, but hopefully a resolution might be useful to someone.
In this use-case you need to use uniqThetaMergeStateIf
rather than uniqThetaStateIf
.
uniqThetaState[If]
is used to create a uniq theta state (AKA uniqThetaSketch) of the values passed. The field theta_uid
is already a uniq theta state (created in the MV) so you're just looking to combine (i.e. merge) the states from records that meet your -If
criteria.
So we use these combinators:
-Merge
because we want to combine states-State
because we want to return the state (rather than count) for use in the uniqThetaIntersect
command-If
because we only want to combine states for records that meet a criteriaSo the query would be:
SELECT finalizeAggregation(
uniqThetaIntersect(
uniqThetaMergeStateIf(theta_uid, show = 'Bridgerton' AND episode = 'S1E1'),
uniqThetaMergeStateIf(theta_uid, show = 'Bridgerton' AND episode = 'S1E2')
)
) AS users
FROM tutorial_mv
Which returns 1 as expected.
Upvotes: 0
Reputation: 2450
Something like that
SELECT finalizeAggregation(
uniqThetaIntersect(
uniqThetaStateIf(theta_uid, "show" = 'Bridgerton' AND "episode" = 'S1E1'),
uniqThetaStateIf(theta_uid, "show" = 'Bridgerton' AND "episode" = 'S1E2')
)
) AS users
FROM ts_tutorial
look to https://clickhouse.com/docs/en/sql-reference/aggregate-functions/combinators/#-if and https://clickhouse.com/docs/en/sql-reference/functions/uniqtheta-functions/
for details
Upvotes: 1