Romillion
Romillion

Reputation: 159

Postgres search missing date in hourly sequence

I have a table period_of_hours in PostgreSQL. This table contains several column but for my case is important column timestamp (timestamp without time zone) and key(character varying). I have a scheduler which add a new row to the table with period of hour. I don't want to bother you why but sometimes it skip adding to the table a new row.

As you can see from this example period from 2024-05-06 10:00:00 to 2024-05-06 15:00:00 contains all lines consecutively, then one line is skipped 2024-05-06 16:00:00. From 2024-05-06 17:00:00 to 2024-05-06 21:00:00 is Ok and two rows is missed 2024-05-06 22:00:00 and 2024-05-06 23:00:00 than 2024-05-07 02:00:00 also missed.

Table "period_of_hours"
timestamp            key 

2024-05-06 10:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 11:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 12:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 13:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 14:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 15:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

2024-05-06 17:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 18:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 19:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 20:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 21:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

2024-05-07 00:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 01:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

2024-05-07 03:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 04:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 05:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'


2024-05-06 10:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 11:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 12:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 13:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 14:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 15:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

2024-05-06 17:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 18:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 19:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 20:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 21:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

2024-05-07 00:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 01:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

2024-05-07 03:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 04:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 05:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

First of all i want to select all rows by key (for example '009eae73-0b55-3809-9938-4bc0a342e451') and filter result set to get all missed rows ? In real table it is more than 1000 rows.

Expected result for key column '009eae73-0b55-3809-9938-4bc0a342e451':

2024-05-06 16:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 22:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 23:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 02:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

This solution is works for me.

with continuous("timestamp") as (
  select 
    '2024-05-06 10:00:00' :: timestamp + n * '1 hour' :: interval 
  FROM 
    generate_series(
      0, 
      (
        select 
          extract(
            epoch 
            from 
              max(timestamp)- min(
                '2024-05-06 10:00:00' :: timestamp
              )
          )/ 3600 
        from 
          period_of_hours 
        where 
          "key" = '0a3e1588-ad59-3586-b071-d5001f5ff9a7'
      ):: integer, 
      1
    ) as n
) 
select 
  c.timestamp 
from 
  continuous c 
  left join period_of_hours p on c.timestamp = p.timestamp 
  and p.key = '0a3e1588-ad59-3586-b071-d5001f5ff9a7' 
where 
  p.timestamp is null;

is it possible to combine key in construction and solution above because i can have a list of UUID ?

key in (
    'a63ffce5-1d86-3afc-8b5a-97452f935632', 
    '0a3e1588-ad59-3586-b071-d5001f5ff9a7',
    'e5524b9b-3aca-3b80-9aab-19bfec30fb9b')```

I dont understand whats is continuous("timestamp") ? It is a function or what ?

Upvotes: 1

Views: 83

Answers (1)

Zegarek
Zegarek

Reputation: 26586

You can generate a continuous version of your data set and run an anti-join against it: demo

with continuous("timestamp") as (
select '2024-05-06 10:00:00'::timestamp+n*'1 hour'::interval
from generate_series( 0
                     ,(select extract(epoch from max("timestamp")
                                                -min("timestamp"))/3600
                      from period_of_hours
                      where "key"='009eae73-0b55-3809-9938-4bc0a342e451')
                     ,1)as n)
select c."timestamp" from continuous c
left join period_of_hours p 
  on c."timestamp"=p."timestamp"
 and p."key"='009eae73-0b55-3809-9938-4bc0a342e451'
where p."timestamp" is null;
timestamp
2024-05-06 16:00:00
2024-05-06 22:00:00
2024-05-06 23:00:00
2024-05-07 02:00:00

An except, a not in or <>all() could achieve the same.

You could also treat this as gaps-and-islands problem and run a window function to spot the gaps and report their width, then generate a patch of rows using that:

select "timestamp"+n*'1h'::interval as "timestamp"
from (select *,extract(epoch from (lead("timestamp")over w1)-"timestamp")
               /3600 as "diff"
      from period_of_hours
      where "key"='009eae73-0b55-3809-9938-4bc0a342e451'
      window w1 as(order by "timestamp") )_
cross join lateral generate_series(1,"diff"-1,1) as n
where "diff">1;

If you want to list each key with all its gaps, it's enough to change the window definition a bit:

select "key","timestamp"+n*'1h'::interval as "timestamp"
from (select *,extract(epoch from (lead("timestamp")over w1)-"timestamp")
               /3600 as "diff"
      from period_of_hours
      window w1 as(partition by "key" order by "timestamp") )_
cross join lateral generate_series(1,"diff"-1,1) as n
where "diff">1;
key timestamp
009eae73-0b55-3809-9938-4bc0a342e451 2024-05-06 16:00:00
009eae73-0b55-3809-9938-4bc0a342e451 2024-05-06 22:00:00
009eae73-0b55-3809-9938-4bc0a342e451 2024-05-06 23:00:00
009eae73-0b55-3809-9938-4bc0a342e451 2024-05-07 02:00:00
fc85b307-15d6-369b-86b5-a4aa19871dff 2024-05-06 16:00:00
fc85b307-15d6-369b-86b5-a4aa19871dff 2024-05-06 22:00:00
fc85b307-15d6-369b-86b5-a4aa19871dff 2024-05-06 23:00:00
fc85b307-15d6-369b-86b5-a4aa19871dff 2024-05-07 02:00:00

Upvotes: 0

Related Questions