Vibhor Gupta
Vibhor Gupta

Reputation: 699

Identify Missing Timestamp Over BigQuery

I have a requirement where i need to find missing Timestamp. Input Data is as given below :-

Row      id       date  
1        x        2021-01-01 10:00:00 UTC
2        x        2021-01-01 10:03:00 UTC
3        x        2021-01-01 10:05:00 UTC
4        x        2021-01-01 10:08:00 UTC
5        y        2021-01-06 10:05:00 UTC
6        y        2021-01-06 10:07:00 UTC
7        y        2021-01-06 10:10:00 UTC

I want Output as, which gives missing timestamps between 2 successive timestamps:-

1        x        2021-01-01 10:01:00 UTC
2        x        2021-01-01 10:02:00 UTC
3        x        2021-01-01 10:04:00 UTC
4        x        2021-01-01 10:06:00 UTC
5        x        2021-01-01 10:07:00 UTC
6        y        2021-01-06 10:06:00 UTC
7        y        2021-01-06 10:08:00 UTC
8        y        2021-01-06 10:09:00 UTC

Upvotes: 0

Views: 193

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Consider below solution - less joins used and most importantly does not generate huge array for ALL minutes between very start and very end data - rather generates such small arrays for ONLY missing minutes. Arrays are memory hogs and affect how performant the query is

select id, missing_date
from (
  select *,
    lag(date) over(partition by id order by date) prev_date
  from `project.dataset.table` t
),
unnest(generate_timestamp_array(timestamp_add(prev_date, interval 1 minute), timestamp_sub(date, interval 1 minute), interval 1 minute)) missing_date 
where timestamp_diff(date, prev_date, minute) > 1    

If applied to sample data in your question - output is

enter image description here

Upvotes: 3

Sergey Geron
Sergey Geron

Reputation: 10152

Try GENERATE_TIMESTAMP_ARRAY:

with mytable as (
  select 'x' as id, timestamp '2021-01-01 10:00:00 UTC' as date union all
  select 'x', timestamp '2021-01-01 10:03:00 UTC' union all
  select 'x', timestamp '2021-01-01 10:05:00 UTC' union all
  select 'x', timestamp '2021-01-01 10:08:00 UTC' union all
  select 'y', timestamp '2021-01-06 10:05:00 UTC' union all
  select 'y', timestamp '2021-01-06 10:07:00 UTC' union all
  select 'y', timestamp '2021-01-06 10:10:00 UTC'
)
select id, missing.date 
from mytable full join (
  select * 
  from (
    select id, GENERATE_TIMESTAMP_ARRAY(min(date), max(date), interval 1 minute) as date_array
    from mytable
    group by id
  ), unnest(date_array) as date
) as missing using (id, date)
where mytable.date is null

enter image description here

Upvotes: 1

Related Questions