Reputation: 3931

PostgreSQL splitting time range into days

I'm trying to write a complex query using PostgreSQL 9.2.4, and I'm having trouble getting it working. I have a table which contains a time range, as well as several other columns. When I store data in this table, if all of the columns are the same and the time ranges overlap or are adjacent, I combine them into one row.

When I retrieve them, though, I want to split the ranges at day boundaries - so for example:

2013-01-01 00:00:00 to 2013-01-02 23:59:59

would be selected as two rows:

2013-01-01 00:00:00 to 2013-01-01 23:59:59
2013-01-02 00:00:00 to 2013-01-02 23:59:59

with the values in the other columns the same for both retrieved entries.

I have seen this question which seems to more or less address what I want, but it's for a "very old" version of PostgreSQL, so I'm not sure it's really still applicable.

I've also seen this question, which does exactly what I want, but as far as I know the CONNECT BY statement is an Oracle extension to the SQL standard, so I can't use it.

I believe I can achieve this using PostgreSQL's generate_series, but I'm hoping there's a simple example out there demonstrating how it can be used to do this.

This is the query I'm working on at the moment, which currently doesn't work (because I can't reference the FROM table in a joined subquery), but I believe this is more-or-less the right track.

Here's the fiddle with the schema, sample data, and my working query.

Update: I just found out a fun fact, thanks to this question, that if you use a set-returning function in the SELECT part of the query, PostgreSQL will "automagically" do a cross join on the set and the row. I think I'm close to getting this working.

Upvotes: 7

Answers (3)

Erwin Brandstetter

Reputation: 656231

First off, your handling of upper bounds is broken. A timestamp with 23:59:59 is no good. The data type timestamp allows fractional digits (currently µs resolution). What about 2013-10-18 23:59:59.123::timestamp?

Include the lower bound and exclude the upper bound everywhere. See:

Calculate number of concurrent events in SQL

Building on this premise:

Since Postgres 9.3

Use a LATERAL subquery.

SELECT id
     , CASE WHEN sday = d THEN stime ELSE d                    END AS stime
     , CASE WHEN eday = d THEN etime ELSE d + interval '1 day' END AS etime
FROM  (
   SELECT id, stime, etime
        , date_trunc('day', stime) AS sday
        , date_trunc('day', etime) AS eday
   FROM   timesheet_entries
   ) t
     , generate_series(sday, eday, interval '1 day') d
WHERE  d < etime  -- filter noise row for etime at 00:00
ORDER  BY id, stime;

fiddle

The subquery t adds sday and eday, which are stime and etime respectively, truncated to the day. For repeated use.

It's best to call generate_series() with timestamp input. See:

Generating time series between two dates in PostgreSQL

When etime falls on 00:00 exactly, a row with a zero time range would be added. The added filter WHERE d < etime discards the noise.

In the above query,

 , generate_series(t.sday, t.eday, interval '1 day') d

is short syntax for:

CROSS JOIN LATERAL generate_series(t.sday, t.eday, interval '1 day') d

See:

What is the difference between LATERAL JOIN and a subquery in PostgreSQL?

Postgres 9.2 or older

-- Postgres 9.2 (with corner case fix)
SELECT id
     , CASE WHEN stime::date = d THEN stime ELSE d     END AS stime
     , CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime
FROM (
   SELECT id, stime, etime
        , generate_series(stime, etime, interval '1 day')::date AS d
   FROM   timesheet_entries
   ) t
WHERE  d < etime  -- filter noise row for etime at 00:00
ORDER  BY id, stime;

fiddle

Upvotes: 15

Daniel Vérité

Reputation: 61506

You may build the result by simply generating all days in the entire set of ranges and join this to the ranges themselves using the OVERLAPS operator. This will both discard the days for which there is no data and do the required cartesian product for the days with data.

OVERLAPS behavior is explained in the doc as this:

(start1, end1) OVERLAPS (start2, end2)
(start1, length1) OVERLAPS (start2, length2)
This expression yields true when two time periods (defined by their endpoints) overlap, false when they do not overlap. The endpoints can be specified as pairs of dates, times, or time stamps; or as a date, time, or time stamp followed by an interval. When a pair of values is provided, either the start or the end can be written first; OVERLAPS automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant. This means for instance that two time periods with only an endpoint in common do not overlap.

With your timesheet_entries table, the query would be:

select days.day, timesheet_entries.* from
  (select day from generate_series(
         (select min(stime) from timesheet_entries),
         (select max(etime) from timesheet_entries),
         '1 day'::interval) day
  ) days
 join timesheet_entries 
  on (stime,etime) overlaps (days.day,days.day+'1 day'::interval)
 order by 1;

This technique makes it also easy to add the days without data, calendar-like. To do so, just replace the join by a left join.

As pointed by @Erwin in his answer, etime should not represent the last second (...:59:59) of the interval but the next second as an excluded upper bound. It may not matter with your current data, but if certain rows had stime at ...:59:59 or if you happened to need sub-second resolution, that would be a problem.

Upvotes: 1

Pavel Stehule

Reputation: 45750

There is simply solution (if intervals starts in same time)

postgres=# select i, i + interval '1day' - interval '1sec' 
  from generate_series('2013-01-01 00:00:00'::timestamp, '2013-01-02 23:59:59', '1day') g(i);
          i          │      ?column?       
─────────────────────┼─────────────────────
 2013-01-01 00:00:00 │ 2013-01-01 23:59:59
 2013-01-02 00:00:00 │ 2013-01-02 23:59:59
(2 rows)

I wrote a table function, that do it for any interval. It is fast - two years range divide to 753 ranges in 10ms

create or replace function day_ranges(timestamp, timestamp)
returns table(t1 timestamp, t2 timestamp) as $$
begin
  t1 := $1;
  if $2 > $1 then
    loop
      if t1::date = $2::date then
        t2 := $2;
        return next;
        exit;
      end if;
      t2 := date_trunc('day', t1) + interval '1day' - interval '1sec';
      return next;
      t1 := t2 + interval '1sec';
    end loop;
  end if;
  return;
end;
$$ language plpgsql;

Result:

postgres=# select * from day_ranges('2013-10-08 22:00:00', '2013-10-10 23:00:00');
         t1          │         t2          
─────────────────────┼─────────────────────
 2013-10-08 22:00:00 │ 2013-10-09 23:59:59
 2013-10-09 00:00:00 │ 2013-10-09 23:59:59
 2013-10-10 00:00:00 │ 2013-10-10 23:00:00
(3 rows)

Time: 6.794 ms

and faster (and little bit longer) version based on RETURN QUERY

create or replace function day_ranges(timestamp, timestamp)
returns table(t1 timestamp, t2 timestamp) as $$
begin
  t1 := $1; t2 := $2;
  if $1::date = $2::date then
    return next;
  else
    -- first day
    t2 := date_trunc('day', t1) + interval '1day' - interval '1sec';
    return next;
    if $2::date > $1::date + 1 then
      return query select d, d + interval '1day' - interval '1sec'
                      from generate_series(date_trunc('day', $1 + interval '1day')::timestamp,
                                           date_trunc('day', $2 - interval '1day')::timestamp,
                                           '1day') g(d);
    end if;
    -- last day 
    t1 := date_trunc('day', $2); t2 := $2;
    return next;
  end if;
  return;
end;
$$ language plpgsql;

Upvotes: 2

PostgreSQL splitting time range into days

Answers (3)

Since Postgres 9.3

Postgres 9.2 or older

Related Questions