Identify data gaps in data with only timestamps

Question

I'm trying to figure out a query to run against Google BigQuery data, that will allow me to identify where the gaps exist, in data that only has timestamps (i.e. there's not a start and end date), data is being ordered by that timestamp, and there are non-consecutive ID numbers on rows.

Sample data:

Row     localised_sample_date_time  
1       2019-03-21T00:00:29
2       2019-03-21T00:01:29
3       2019-03-21T00:02:29
4       2019-03-21T00:04:29
3       2019-03-21T00:05:29

Each row of data contains a timestamp, and each row of data is expected to occur at a predictable time interval (e.g. 10 mins). I'm looking for a query that illustrates where there are data gaps i.e. when the expected '10-minutely' row isn't present.

I want the data to look something like this:

data island
-------------------------------------------------------
[start timestamp of island] - [end timestamp of island]

e.g.

data island
-------------------------------------------------------
2019-03-21T00:00:29 - 2019-03-21T00:02:29
2019-03-21T00:04:29 - 2019-03-21T00:05:29

Logically I'm looking for something like:

order by timestamp
figure out if consecutive records have a 10 minute gap between them
if they don't, then output the start and end timestamp that represents the data island.

This is a good general resource that explains how to identify data gaps and islands - http://www.kodyaz.com/t-sql/data-islands-and-data-gaps-with-boundaries-using-sql.aspx - it doesn't seem to cover data that is structured in the way mine is though.

I'm thinking there must be something I can use that does something with DATEADD , ORDER BY and GROUP BY, to identify the groups of records that conform to the expected pattern of 10 minute intervals, however I'm not seeing how I need to implement what I'm trying to achieve. What's an elegant, simple solution?

Identify data gaps in data with only timestamps

Answers (1)

Related Questions