Reputation: 3931
I'm trying to write a complex query using PostgreSQL 9.2.4, and I'm having trouble getting it working. I have a table which contains a time range, as well as several other columns. When I store data in this table, if all of the columns are the same and the time ranges overlap or are adjacent, I combine them into one row.
When I retrieve them, though, I want to split the ranges at day boundaries - so for example:
2013-01-01 00:00:00 to 2013-01-02 23:59:59
would be selected as two rows:
2013-01-01 00:00:00 to 2013-01-01 23:59:59
2013-01-02 00:00:00 to 2013-01-02 23:59:59
with the values in the other columns the same for both retrieved entries.
I have seen this question which seems to more or less address what I want, but it's for a "very old" version of PostgreSQL, so I'm not sure it's really still applicable.
I've also seen this question, which does exactly what I want, but as far as I know the CONNECT BY
statement is an Oracle extension to the SQL standard, so I can't use it.
I believe I can achieve this using PostgreSQL's generate_series
, but I'm hoping there's a simple example out there demonstrating how it can be used to do this.
This is the query I'm working on at the moment, which currently doesn't work (because I can't reference the FROM
table in a joined subquery), but I believe this is more-or-less the right track.
Here's the fiddle with the schema, sample data, and my working query.
Update: I just found out a fun fact, thanks to this question, that if you use a set-returning function in the SELECT
part of the query, PostgreSQL will "automagically" do a cross join on the set and the row. I think I'm close to getting this working.
Upvotes: 7
Views: 11764
Reputation: 656231
First off, your handling of upper bounds is broken. A timestamp with 23:59:59
is no good. The data type timestamp
allows fractional digits (currently µs resolution). What about 2013-10-18 23:59:59.123::timestamp
?
Include the lower bound and exclude the upper bound everywhere. See:
Building on this premise:
Use a LATERAL
subquery.
SELECT id
, CASE WHEN sday = d THEN stime ELSE d END AS stime
, CASE WHEN eday = d THEN etime ELSE d + interval '1 day' END AS etime
FROM (
SELECT id, stime, etime
, date_trunc('day', stime) AS sday
, date_trunc('day', etime) AS eday
FROM timesheet_entries
) t
, generate_series(sday, eday, interval '1 day') d
WHERE d < etime -- filter noise row for etime at 00:00
ORDER BY id, stime;
The subquery t
adds sday
and eday
, which are stime
and etime
respectively, truncated to the day. For repeated use.
It's best to call generate_series()
with timestamp
input. See:
When etime
falls on 00:00
exactly, a row with a zero time range would be added. The added filter WHERE d < etime
discards the noise.
In the above query,
, generate_series(t.sday, t.eday, interval '1 day') d
is short syntax for:
CROSS JOIN LATERAL generate_series(t.sday, t.eday, interval '1 day') d
See:
-- Postgres 9.2 (with corner case fix)
SELECT id
, CASE WHEN stime::date = d THEN stime ELSE d END AS stime
, CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime
FROM (
SELECT id, stime, etime
, generate_series(stime, etime, interval '1 day')::date AS d
FROM timesheet_entries
) t
WHERE d < etime -- filter noise row for etime at 00:00
ORDER BY id, stime;
Upvotes: 15
Reputation: 61506
You may build the result by simply generating all days in the entire set of ranges and join this to the ranges themselves using the OVERLAPS operator. This will both discard the days for which there is no data and do the required cartesian product for the days with data.
OVERLAPS behavior is explained in the doc as this:
(start1, end1) OVERLAPS (start2, end2)
(start1, length1) OVERLAPS (start2, length2)
This expression yields true when two time periods (defined by their endpoints) overlap, false when they do not overlap. The endpoints can be specified as pairs of dates, times, or time stamps; or as a date, time, or time stamp followed by an interval. When a pair of values is provided, either the start or the end can be written first; OVERLAPS automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant. This means for instance that two time periods with only an endpoint in common do not overlap.
With your timesheet_entries
table, the query would be:
select days.day, timesheet_entries.* from
(select day from generate_series(
(select min(stime) from timesheet_entries),
(select max(etime) from timesheet_entries),
'1 day'::interval) day
) days
join timesheet_entries
on (stime,etime) overlaps (days.day,days.day+'1 day'::interval)
order by 1;
This technique makes it also easy to add the days without data, calendar-like. To do so, just replace the join
by a left join
.
As pointed by @Erwin in his answer, etime
should not represent the last second (...:59:59
) of the interval but the next second as an excluded upper bound. It may not matter with your current data, but if certain rows had stime
at ...:59:59
or if you happened to need sub-second resolution, that would be a problem.
Upvotes: 1
Reputation: 45750
There is simply solution (if intervals starts in same time)
postgres=# select i, i + interval '1day' - interval '1sec' from generate_series('2013-01-01 00:00:00'::timestamp, '2013-01-02 23:59:59', '1day') g(i); i │ ?column? ─────────────────────┼───────────────────── 2013-01-01 00:00:00 │ 2013-01-01 23:59:59 2013-01-02 00:00:00 │ 2013-01-02 23:59:59 (2 rows)
I wrote a table function, that do it for any interval. It is fast - two years range divide to 753 ranges in 10ms
create or replace function day_ranges(timestamp, timestamp) returns table(t1 timestamp, t2 timestamp) as $$ begin t1 := $1; if $2 > $1 then loop if t1::date = $2::date then t2 := $2; return next; exit; end if; t2 := date_trunc('day', t1) + interval '1day' - interval '1sec'; return next; t1 := t2 + interval '1sec'; end loop; end if; return; end; $$ language plpgsql;
Result:
postgres=# select * from day_ranges('2013-10-08 22:00:00', '2013-10-10 23:00:00'); t1 │ t2 ─────────────────────┼───────────────────── 2013-10-08 22:00:00 │ 2013-10-09 23:59:59 2013-10-09 00:00:00 │ 2013-10-09 23:59:59 2013-10-10 00:00:00 │ 2013-10-10 23:00:00 (3 rows) Time: 6.794 ms
and faster (and little bit longer) version based on RETURN QUERY
create or replace function day_ranges(timestamp, timestamp) returns table(t1 timestamp, t2 timestamp) as $$ begin t1 := $1; t2 := $2; if $1::date = $2::date then return next; else -- first day t2 := date_trunc('day', t1) + interval '1day' - interval '1sec'; return next; if $2::date > $1::date + 1 then return query select d, d + interval '1day' - interval '1sec' from generate_series(date_trunc('day', $1 + interval '1day')::timestamp, date_trunc('day', $2 - interval '1day')::timestamp, '1day') g(d); end if; -- last day t1 := date_trunc('day', $2); t2 := $2; return next; end if; return; end; $$ language plpgsql;
Upvotes: 2