Plouf
Plouf

Reputation: 397

Run a SQL query against ten-minutes time intervals

I have a postgresql table with this schema:

id SERIAL PRIMARY KEY,
traveltime INT,
departuredate TIMESTAMPTZ,
departurehour TIMETZ

Here is a bit of data (edited):

 id | traveltime |     departuredate      | departurehour 
  ----+------------+------------------------+---------------
    1 |         73 | 2019-12-24 00:00:03+01 | 00:00:03+01
    2 |         73 | 2019-12-24 00:12:16+01 | 00:12:16+01
   53 |        115 | 2019-12-24 07:53:44+01 | 07:53:44+01
   54 |        116 | 2019-12-24 07:58:45+01 | 07:58:45+01
   55 |        119 | 2019-12-24 08:03:46+01 | 08:03:46+01
   56 |        120 | 2019-12-24 08:08:47+01 | 08:08:47+01
   57 |        121 | 2019-12-24 08:13:48+01 | 08:13:48+01
   58 |        121 | 2019-12-24 08:18:48+01 | 08:18:48+01
  542 |        112 | 2019-12-26 07:52:41+01 | 07:52:41+01 
  543 |        114 | 2019-12-26 07:57:42+01 | 07:57:42+01
  544 |        116 | 2019-12-26 08:02:43+01 | 08:02:43+01
  545 |        116 | 2019-12-26 08:07:44+01 | 08:07:44+01
  546 |        117 | 2019-12-26 08:12:45+01 | 08:12:45+01
  547 |        118 | 2019-12-26 08:17:46+01 | 08:17:46+01
  548 |        118 | 2019-12-26 08:22:48+01 | 08:22:48+01
 1031 |         80 | 2019-12-28 07:50:33+01 | 07:50:33+01
 1032 |         81 | 2019-12-28 07:55:34+01 | 07:55:34+01
 1033 |         81 | 2019-12-28 08:00:35+01 | 08:00:35+01
 1034 |         82 | 2019-12-28 08:05:36+01 | 08:05:36+01
 1035 |         82 | 2019-12-28 08:10:37+01 | 08:10:37+01
 1036 |         83 | 2019-12-28 08:15:38+01 | 08:15:38+01
 1037 |         83 | 2019-12-28 08:20:39+01 | 08:20:39+01

I'd like to get the average for all the values collected for traveltime for each 10 minutes interval for several weeks.

Expected result for the data sample: for the 10-minutes interval between 8h00 and 8h10, the rows that will be included in the avg are with id 55, 56, 544, 545, 1033 and 1034 and so on.

I can get the average for a specific interval:

select avg(traveltime) from belt where departurehour >= '10:40:00+01' and departurehour < '10:50:00+01';

To avoid creating a query for each interval, I used this query to get all the 10-minutes intervals for the complete period encoded:

select i from generate_series('2019-11-23', '2020-01-18', '10 minutes'::interval) i;

What I miss is a way to apply my AVG query to each of these generated intervals. Any direction would be helpful!

Upvotes: 0

Views: 789

Answers (1)

Belayer
Belayer

Reputation: 14861

It turns out that the generate_series does not actually apply as requardless of the date range. The critical part is the 144 10Min intervals per day. Unfortunatly Postgres does not provide an interval type for minuets. (Perhaps creating one would be a useful exersize). But all is not loss you can simulate the same with BETWEEN, just need to play with the ending of the range.
The following generates this simulation using a recursive CTE. Then as previously joins to your table.

set timezone to '+1';    -- necessary to keep my local offset from effecting results. 
-- create table an insert data here 
-- additional data added outside of date range so should not be included) 
with recursive min_intervals as 
       (select '00:00:00'::timetz        start_10Min   -- start of 1st 10Min interval
             , '00:09:59.999999'::timetz end_10Min     -- last microsecond in 10Min interval
             , 1 interval_no
        union all 
        select start_10Min + interval '10 min'        
             , end_10Min   + interval '10 min'  
             , interval_no + 1
          from Min_intervals
         where interval_no < 144                   -- 6 10Min intervals/hr * 24 Hr/day = No of 10Min intervals in any day
       )  -- select * from min_intervals;
select start_10Min, end_10Min, avg(traveltime) average_travel_time
  from min_intervals
  join belt  
     on departuredate::time between start_10Min and end_10Min
  where departuredate::date between date '2019-11-23' and date '2020-01-18'  
  group by start_10Min, end_10Min
  order by start_10Min;   

-- test result for 'specified' Note added rows fall within time frame 08:00 to 08:10
-- but these should be excluded so the avg for that period should be the same for both queries.
 select avg(traveltime) from belt where id in (55, 56, 544, 545, 1033, 1034); 

My issue with the above is the data range is essentially hard coded (yes substitution parameter are available) and manually but that is OK for psql or an IDE but not good for a production environment. If this is to be used in that environment I'd use the following function to return a virtual table of the same results.

 create or replace function travel_average_per_10Min_interval(
                            start_date_in date
                          , end_date_in   date
                          ) 
returns table (Start_10Min     timetz
              ,end_10Min       timetz
              ,avg_travel_time numeric
              )
language sql
as $$
    with recursive min_intervals as 
           (select '00:00:00'::timetz        start_10Min   -- start of 1st 10Min interval
                 , '00:09:59.999999'::timetz end_10Min     -- last microsecond in 10Min interval
                 , 1 interval_no
            union all 
            select start_10Min + interval '10 min'        
                 , end_10Min   + interval '10 min'  
                 , interval_no + 1
              from Min_intervals
             where interval_no < 144                        -- 6 10Min intervals/hr * 24 Hr/day = No of 10Min intervals in any day
           )  -- select * from min_intervals;
    select start_10Min, end_10Min, avg(traveltime) average_travel_time
      from min_intervals
      join belt  
        on departuredate::time between start_10Min and end_10Min
     where departuredate::date between start_date_in and end_date_in  
     group by start_10Min, end_10Min
     order by start_10Min;                  
$$;

-- test 
select * from travel_average_per_10Min_interval(date '2019-11-23', date '2020-01-18');

Upvotes: 1

Related Questions