stackoverflow2019
stackoverflow2019

Reputation: 73

ClickHouse uniqTheta* variations

According to Druid doc, the DS_THETA function can work on sketches.

How could the below Druid query be rewritten in ClickHouse using the uniqTheta* variations?

SELECT THETA_SKETCH_ESTIMATE(
         THETA_SKETCH_INTERSECT(
           DS_THETA(theta_uid) FILTER(WHERE "show" = 'Bridgerton' AND "episode" = 'S1E1'),
           DS_THETA(theta_uid) FILTER(WHERE "show" = 'Bridgerton' AND "episode" = 'S1E2')
         )
       ) AS users
FROM ts_tutorial

Here's my attempt to solve it using MV

CREATE TABLE ts_tutorial
(
   date_id Date,
   uid String,
   show String,
   episode String
)
ENGINE = MergeTree()
ORDER BY (date_id, show, episode, uid);

CREATE TABLE tutorial_tbl
(
   date_id Date,
   show String,
   episode String,
   theta_uid AggregateFunction(uniqTheta, String)
)
ENGINE = AggregatingMergeTree()
ORDER BY (date_id, show, episode);

CREATE MATERIALIZED VIEW IF NOT EXISTS tutorial_mv TO tutorial_tbl
AS
    SELECT
        date_id,
        show,
        episode,
        uniqThetaState(uid) as theta_uid

    FROM ts_tutorial
    GROUP BY date_id, show, episode
;

INSERT INTO ts_tutorial VALUES ('2022-05-19','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-19','alice','Game of Thrones','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-19','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-19','bob','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-20','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-20','carol','Bridgerton','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-20','dan','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','carol','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','erin','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','alice','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','bob','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','bob','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','carol','Bridgerton','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-22','bob','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','erin','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','erin','Bridgerton','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-23','erin','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-23','alice','Game of Thrones','S1E1');

And to answer: How many users watched both episodes of Bridgerton?

SELECT finalizeAggregation(
         uniqThetaIntersect(
           uniqThetaStateIf(theta_uid, show = 'Bridgerton' AND episode = 'S1E1'),
           uniqThetaStateIf(theta_uid, show = 'Bridgerton' AND episode = 'S1E2')
         )
       ) AS users
FROM tutorial_mv

The above query would return 0 instead of 1 for user carol. Which doesn't seem to work

Upvotes: 0

Views: 166

Answers (2)

Jason
Jason

Reputation: 1

I know this is an old post, but hopefully a resolution might be useful to someone.

In this use-case you need to use uniqThetaMergeStateIf rather than uniqThetaStateIf.

uniqThetaState[If] is used to create a uniq theta state (AKA uniqThetaSketch) of the values passed. The field theta_uid is already a uniq theta state (created in the MV) so you're just looking to combine (i.e. merge) the states from records that meet your -If criteria.

So we use these combinators:

  • -Merge because we want to combine states
  • -State because we want to return the state (rather than count) for use in the uniqThetaIntersect command
  • -If because we only want to combine states for records that meet a criteria

So the query would be:

SELECT finalizeAggregation(
  uniqThetaIntersect(
    uniqThetaMergeStateIf(theta_uid, show = 'Bridgerton' AND episode = 'S1E1'),
    uniqThetaMergeStateIf(theta_uid, show = 'Bridgerton' AND episode = 'S1E2')
  )
) AS users
FROM tutorial_mv

Which returns 1 as expected.

Upvotes: 0

Slach
Slach

Reputation: 2450

Something like that

SELECT finalizeAggregation(
         uniqThetaIntersect(
           uniqThetaStateIf(theta_uid, "show" = 'Bridgerton' AND "episode" = 'S1E1'),
           uniqThetaStateIf(theta_uid, "show" = 'Bridgerton' AND "episode" = 'S1E2')
         )
       ) AS users
FROM ts_tutorial

look to https://clickhouse.com/docs/en/sql-reference/aggregate-functions/combinators/#-if and https://clickhouse.com/docs/en/sql-reference/functions/uniqtheta-functions/

for details

Upvotes: 1

Related Questions