MSepehr
MSepehr

Reputation: 970

Sampling huge database based on multiple criteria

I want to sample from a database table by these condition: my sample should have at least 100 sample in every hour of every day from specific category column . table is as below :

id recieved_time    catgory
1  2022-08-01 4:05   blue
1  2022-08-01 5:05   red
1  2022-08-01 7:05   red
...

can anyone help me to write an efficient sql query ? (by the way i am using clickhouse if that help)

Upvotes: 0

Views: 130

Answers (1)

John K.
John K.

Reputation: 525

Try ROW_NUMBER() without an ORDER BY condition if you just want a random sampling of 100 per category:

SELECT
  *
FROM
  (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY
        category,
        DATE_TRUNC('hour', received_time)
      ORDER BY
        (SELECT NULL) -- change if you care
    ) AS category_rownumber
  FROM
    TableName
  ) X
WHERE
  category_rownumber <= 100
ORDER BY
  category,
  received_time DESC
;

Upvotes: 1

Related Questions