Calculate periodic counts in SQL where some periods may have zero count

Question

I have some data such as this:

id  tstamp
1   2021-06-04 09:00:00
2   2021-06-04 09:01:00
3   2021-06-04 10:00:00
4   2021-06-04 12:00:00
5   2021-06-04 14:00:00
6   2021-06-04 14:20:00

and I wish to count the rows between certain periods.

For example, if I wanted to count by hour, I could do something like this:

SELECT HOUR(tstamp), COUNT(*) from logs GROUP BY HOUR(tstamp)

However, this returns:

HOUR(tstamp)    COUNT(*)
9   2
10  1
12  1
14  2

but I would actually like:
9   2
10  1
11  0
12  1
13  0
14  2

Moreover, in the actual use case, any period may be specified. It could be:

every minute
every 5 mins
every hour
every day
every 3 days
every week
every month

so I would need to find a good way to handle any arbitrary period, if that's even possible (?) or at least if there is a general approach that I can modify for specific periods, that would also be fine.

Here is a dbfiddle

Edit: Following the comment by @georgeos I have solved the first part with:

WITH
periods (p) AS (SELECT 9 UNION SELECT 10 UNION SELECT 11 UNION SELECT 12 UNION SELECT 13 UNION SELECT 14),
logs1 AS (SELECT HOUR(tstamp) as Hour, COUNT(*) AS Count from logs GROUP BY HOUR(tstamp))

SELECT periods.p, logs1.Count from periods
  LEFT JOIN logs1 on periods.p = logs1.Hour

although this doesn't strike me as an elegant solution, and still leaves me wondering how to handle other periods nicely.

PankajSanwal · Accepted Answer

Sample data generation:

create table mytable (id int, tstamp timestamp);

insert into mytable values (1  , '2021-06-04 09:00:00');
insert into mytable values (2 ,  '2021-06-04 09:01:00');
insert into mytable values (3  , '2021-06-04 10:00:00');
insert into mytable values (4   ,'2021-06-04 12:00:00');
insert into mytable values (5   ,'2021-06-04 14:00:00');
insert into mytable values (6   ,'2021-06-04 14:20:00');
insert into mytable values (7  , '2021-06-05 11:00:00');
insert into mytable values (8 ,  '2021-06-05 11:01:00');
insert into mytable values (9  , '2021-06-05 12:00:00');
insert into mytable values (10   ,'2021-06-05 15:00:00');
insert into mytable values (11  ,'2021-06-05 16:00:00');
insert into mytable values (12  ,'2021-06-05 23:20:00');
insert into mytable values (13  ,'2021-06-05 09:20:00');

Lets go thru this step by step which will help you get going for other scenarios like minute, 5 minutes, day etc.

CTE here is making a super set of the units by which you want to count, Here it is hours. so CTE is just generating a sigle column table of 24 hours which will acts as our superset. I have used recursive CTE as i am not a fan of writing static values with a lot of unions.
First join after the CTE is cross join with distinct dates in your table. This is mandatory as hour repeat everyday and your table might have data for multiple days.
Last left join is just matching the actual table data with each hour of each day and later we are counting the tstamp column from our table for matched values.

Final Query:

with recursive RecCTE
as
(select 0 as hours
union all
select x.hours + 1
from RecCTE x
where x.hours <= 22 
)
Select x.hours ,z.dist_date, count(y.tstamp) cnt
from RecCTE x
cross join (select distinct date(tstamp) dist_date from mytable) z
left join mytable y on x.hours = hour(y.tstamp) and date(y.tstamp) = z.dist_date
group by x.hours,z.dist_date
order by z.dist_date, x.hours;

You can generalize these steps to port this query for other scenarios like month, week, minute etc.

Calculate periodic counts in SQL where some periods may have zero count

Answers (1)

Related Questions