Reputation: 15105
I have a query that returns some device state information with device id and timestamp. I am trying to do some daily reports and for that I need to know what state devices were in during the day. So, for example, my query might get me something like this
device id start end state
---------------------------------------------------------
1 2017-01-01 13:38 2017-01-03 12:47 1
2 2017-01-01 03:15 2017-01-02 11:04 1
... more records for devices including devices 1 and 2 ...
what I want to end up with is
device id start end state
---------------------------------------------------------
1 2017-01-01 13:38 2017-01-01 23:59 1
1 2017-01-02 00:00 2017-01-02 23:59 1
1 2017-01-03 00:00 2017-01-03 12:47 1
2 2017-01-01 03:15 2017-01-01 23:59 1
2 2017-01-02 00:00 2017-01-02 11:04 1
What I tried, is something like this
select
l.device_id,
gs.ts as "day",
case when l.start < gs.ts then gs.ts else l.start end as start,
case when l.end > gs.ts + '1 day'::interval then gs.ts + '1 day'::interval else l.end end as end,
l.state
from ( ... my query goes here ... ) as l
right join
(select generate_series(
date 'start date',
date 'end date',
'1 day'::interval)) as gs(ts)
on ((gs.ts, gs.ts + '1 day'::interval) overlaps (l.start, l.end))
order by l.device_id, l.start
Essentially, I right join a sequence of days using overlap function, so every interval that overlaps with that day generates a row and then I cut intervals at the day borders.
Then I use this as a nested select for my daily calculations.
The problem with this approach is that the right join generates a lot of records and the join filter then takes forever. Here is a piece of explain analyze
output
-> Nested Loop Left Join (cost=5371.28..3149290.69 rows=11525332 width=32) (actual time=228.799..32849.000 rows=41197 loops=1)
Join Filter: ... the generate sequence stuff removed for brevity...
Rows Removed by Join Filter: 4994476
As you can see, it generated about 5 million rows, filtered them down to 41K rows and the operation took some 32 seconds.
Is there a more efficient solution to this problem?
Upvotes: 1
Views: 1780
Reputation: 1269443
This should be faster than your current approach:
select q.device_id,
generate_series(start::date, end::date, interval '1 day') as day,
end as day,
state
from (your query here) q;
You can use a subquery to get the exact date/times that you want.
Upvotes: 2