BigQuery can use wildcard table names and table_suffix, but I am looking for a similar solution like wildcard datasets and dataset_suffix

Question

So if you process data daily and put the results into the same dataset, such as results, and each day will have the same table name (first part) and with date as table_suffix, such as result1_20190101, result1_20190102 etc., they you query the result tables use wildcard table names and table_suffix.

So your dataset/tables looks like

results/result1_20190101
results/result1_20190102

results/result2_20190101
results/result2_20190102

So I can query all the result1

select * from `xxxx.results.result1*`

But I arrange the results tables differently. Due to I have dozens tables processed each day. so to easily check and manage each day results. I use date as dataset.

so my dataset/tables look like

20190101/result1
20190101/result2
...

20190102/result1
20190102/result2
...

And my daily data process usually will not query cross dates(datasets). the daily results are pushed to next step data pipelines etc.

But once a while, I need to do some quick check, and I need to query across the dates(in my case, across the datasets)

so when I try to query result1, I have to hard code the dataset name.

select * from `xxxxxx.20190101/result1`
union all
select * from `xxxxxx.20190102/result1`
union all
...

1) First question is, are there anyway I could use wildcards and suffix on datasets, like we could with tables?

2) Second question: how could I use the date function, such as DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) to get the date value and use the data value in the below query

select * from `xxxxxx.20190101/result1`
union all
select * from `xxxxxx.20190102/result1`
union all
...

to replace the hard coded value, 20190101, 20190102 etc.

Mikhail Berlyant · Accepted Answer

There is no wildcards and/or suffix available on BigQuery datasets (at least as of today)

Meantime, you can check a feature request for INFORMATION_SCHEMA that is in Alpha now. You can apply for it by submitting form that is available there.

In short: you will be able to query list of datasets in the projects and then use it to construct your query. Please note - you still need to use some sort of client to script all this properly

BigQuery can use wildcard table names and table_suffix, but I am looking for a similar solution like wildcard datasets and dataset_suffix

Answers (1)

Related Questions