russianmax
russianmax

Reputation: 177

BigQuery - is there a way to find missing dates in a data sequence?

We have setup up some GA360 exports directly live to our BigQuery and how it works that for every one days GA360 export we have a table created in the view. So if you have 400 days worth of GA360 data you should have 400 tables.

From looking at these I have noticed that for some of these views the total amount of tables doesn't match up which means that we could be potentially missing some data. Would there be a way to find the tables/dates that we are missing?

I'm running this query - this give us the dataset and table IDs. I have parsed the table IDs into the date format so we easier to see what dates are there and which aren't. Is there a way to find the missing the dates from the results of the query below

SELECT 'xxx' as dataset_id, 
PARSE_DATE('%Y%m%d',RIGHT(table_id,8)) as date 
FROM `project.xxx.__TABLES__` where table_id like 'ga_sessions_2%'
  

Output

dataset_id date
123456 2022-04-11
123456 2022-04-12
123456 2022-06-01
123456 2022-06-02

Is it possible to modify query above so that there's a record for each date that are we missing between 2022-04-12 and 2022-06-01?

Upvotes: 1

Views: 1128

Answers (1)

Jaytiger
Jaytiger

Reputation: 12234

Consider below query for missing dates:

CREATE TEMP TABLE sample AS 
SELECT '123456' dataset_id, DATE '2022-04-11' date UNION ALL
SELECT '123456' dataset_id, DATE '2022-04-12' date UNION ALL
SELECT '123456' dataset_id, DATE '2022-06-01' date UNION ALL
SELECT '123456' dataset_id, DATE '2022-06-02' date;

SELECT date FROM UNNEST(GENERATE_DATE_ARRAY('2022-04-12', '2022-06-01')) date
EXCEPT DISTINCT
SELECT date FROM sample;
Multiple Dataset Example
SELECT dataset_id, date FROM (
  SELECT DISTINCT dataset_id FROM sample
),UNNEST(GENERATE_DATE_ARRAY('2022-04-12', '2022-06-01')) date
EXCEPT DISTINCT
SELECT dataset_id, date FROM sample;

Upvotes: 3

Related Questions