Reputation: 3
I'm trying to know how much record a user has generated during the last x hours from when the record is created. In classic SQL what I've tried so far should be working.
I'm posting here because I didn't find anything specific to Clickhouse about this specific problem.
Have a nice day ;)
Simplified example - source table:
ID | time | name
---+---------------+---------
1 05/05 14:20 bob
2 05/05 14:30 josh
3 05/05 18:30 bob
4 06/05 15:30 bob
5 08/05 18:30 josh
Select result - source table :
ID | time | name | nb_last_24hours
---+---------------+-------+----------------
1 05/05 14:20 bob 0
2 05/05 14:30 josh 0
3 05/05 18:30 bob 1
4 06/05 15:30 bob 1
5 08/05 18:30 josh 0
I've tried this query:
SELECT
b.ID, b.`time`, b.name,
(SELECT COUNT(*)
FROM T1 AS a
WHERE a.name = b.name
AND a.`time` < b.`time`
AND a.`time` >= (b.`time` - (60 * 60 * 24))
)
FROM
T1 AS b;
but I get the following error:
SQL Error [47] [07000]: Code: 47. DB::Exception: Missing columns: 'b.time' 'b.a_msisdn' while processing query: 'SELECT count() FROM VFG.voice_traffic AS a WHERE (a_msisdn = b.a_msisdn) AND (time < b.time) AND (time >= (b.time - ((60 * 60) * 24)))', required columns: 'a_msisdn' 'b.a_msisdn' 'time' 'b.time', maybe you meant: ['a_msisdn','a_msisdn','time','time']: While processing (SELECT count(*) FROM VFG.voice_traffic AS a WHERE (a.a_msisdn = b.a_msisdn) AND (a.time < b.time) AND (a.time >= (b.time - ((60 * 60) * 24)))) AS _subquery4916. (UNKNOWN_IDENTIFIER) (version 22.3.3.44 (official build)) , server ClickHouseNode [uri=http://192.168.13.79:8123/default, options={use_server_time_zone=false,use_time_zone=false}]@-1433098529
I'm assuming that b doesn't exist in the context of the inner query.
Upvotes: 0
Views: 520
Reputation: 12339
Try COUNT with RANGE.
SELECT
c1, c2
, COUNT (1) OVER
(
PARTITION BY c1
ORDER BY c2
RANGE BETWEEN
86400 PRECEDING
AND
1 PRECEDING
)
as cnt
FROM VALUES
(
('bob', now())
, ('bob', now() + INTERVAL 10 HOUR)
, ('bob', now() + INTERVAL 25 HOUR)
, ('josh', now())
, ('josh', now() + INTERVAL 10 HOUR)
, ('josh', now() + INTERVAL 15 HOUR)
)
FORMAT PrettyCompact
The result is:
+-c1---+------------------c2-+-cnt-+
| bob | 2023-09-15 20:22:34 | 0 |
| bob | 2023-09-16 06:22:34 | 1 |
| bob | 2023-09-16 21:22:34 | 1 |
| josh | 2023-09-15 20:22:34 | 0 |
| josh | 2023-09-16 06:22:34 | 1 |
| josh | 2023-09-16 11:22:34 | 2 |
+------+---------------------+-----+
Upvotes: 1
Reputation: 828
You should be able to accomplish that with one query - something like this:
SELECT
name,
time,
count()
FROM T1
WHERE time >= now() - INTERVAL 24 HOUR
GROUP BY (name, time);
Upvotes: 1