Joe King
Joe King

Reputation: 3021

Calculate periodic counts in SQL where some periods may have zero count

I have some data such as this:

id  tstamp
1   2021-06-04 09:00:00
2   2021-06-04 09:01:00
3   2021-06-04 10:00:00
4   2021-06-04 12:00:00
5   2021-06-04 14:00:00
6   2021-06-04 14:20:00

and I wish to count the rows between certain periods.

For example, if I wanted to count by hour, I could do something like this:

SELECT HOUR(tstamp), COUNT(*) from logs GROUP BY HOUR(tstamp)

However, this returns:

HOUR(tstamp)    COUNT(*)
9   2
10  1
12  1
14  2
but I would actually like:
9   2
10  1
11  0
12  1
13  0
14  2

Moreover, in the actual use case, any period may be specified. It could be:

so I would need to find a good way to handle any arbitrary period, if that's even possible (?) or at least if there is a general approach that I can modify for specific periods, that would also be fine.

Here is a dbfiddle

Edit: Following the comment by @georgeos I have solved the first part with:

WITH
periods (p) AS (SELECT 9 UNION SELECT 10 UNION SELECT 11 UNION SELECT 12 UNION SELECT 13 UNION SELECT 14),
logs1 AS (SELECT HOUR(tstamp) as Hour, COUNT(*) AS Count from logs GROUP BY HOUR(tstamp))

SELECT periods.p, logs1.Count from periods
  LEFT JOIN logs1 on periods.p = logs1.Hour

although this doesn't strike me as an elegant solution, and still leaves me wondering how to handle other periods nicely.

Upvotes: 0

Views: 110

Answers (1)

PankajSanwal
PankajSanwal

Reputation: 1019

Sample data generation:

create table mytable (id int, tstamp timestamp);

insert into mytable values (1  , '2021-06-04 09:00:00');
insert into mytable values (2 ,  '2021-06-04 09:01:00');
insert into mytable values (3  , '2021-06-04 10:00:00');
insert into mytable values (4   ,'2021-06-04 12:00:00');
insert into mytable values (5   ,'2021-06-04 14:00:00');
insert into mytable values (6   ,'2021-06-04 14:20:00');
insert into mytable values (7  , '2021-06-05 11:00:00');
insert into mytable values (8 ,  '2021-06-05 11:01:00');
insert into mytable values (9  , '2021-06-05 12:00:00');
insert into mytable values (10   ,'2021-06-05 15:00:00');
insert into mytable values (11  ,'2021-06-05 16:00:00');
insert into mytable values (12  ,'2021-06-05 23:20:00');
insert into mytable values (13  ,'2021-06-05 09:20:00');

Lets go thru this step by step which will help you get going for other scenarios like minute, 5 minutes, day etc.

  1. CTE here is making a super set of the units by which you want to count, Here it is hours. so CTE is just generating a sigle column table of 24 hours which will acts as our superset. I have used recursive CTE as i am not a fan of writing static values with a lot of unions.

  2. First join after the CTE is cross join with distinct dates in your table. This is mandatory as hour repeat everyday and your table might have data for multiple days.

  3. Last left join is just matching the actual table data with each hour of each day and later we are counting the tstamp column from our table for matched values.

Final Query:

with recursive RecCTE
as
(select 0 as hours
union all
select x.hours + 1
from RecCTE x
where x.hours <= 22 
)
Select x.hours ,z.dist_date, count(y.tstamp) cnt
from RecCTE x
cross join (select distinct date(tstamp) dist_date from mytable) z
left join mytable y on x.hours = hour(y.tstamp) and date(y.tstamp) = z.dist_date
group by x.hours,z.dist_date
order by z.dist_date, x.hours;

You can generalize these steps to port this query for other scenarios like month, week, minute etc.

Upvotes: 2

Related Questions