Reputation: 839
I've got an almost identical scenario to this question:
How to choose the latest partition in BigQuery table?
With one additional complication. I need the result to display in Data Studio.
The setup
I've got a series of data sets which appear at different time intervals apart and I need to get the most recent partition. Because they're inconsistent time periods apart I can't just get the last day and use that.
I can use BigQuery scripting to successfully prune the queries with a dynamic query, but when I move this query into Data Studio the query doesn't load properly.
The table loads in the data sources part:
But when I actually try to use it in the report:
Data Studio cannot connect to your data set.
Failed to fetch data from the underlying data set
Error ID: e6546a97
Is there a way to get Data Studio to display this properly with pruning?
Example query
DECLARE max_date DATE;
SET max_date = (SELECT DATE(MAX(_partitiontime)) FROM `dataset.table`);
SELECT *
FROM `dataset.table`
WHERE DATE(_partitiontime ) = max_date
Upvotes: 4
Views: 1150
Reputation: 1978
You have access to all your partitions information in the project.dataset.INFORMATION_SCHEMA.PARTITIONS
table.
Therefore you can try something like:
SELECT *
FROM `dataset.table`
WHERE DATE(_PARTITION_TIME) = (
SELECT
MAX(SAFE.PARSE_DATE('%Y%m%d',partition_id))
FROM `dataset.INFORMATION_SCHEMA.PARTITIONS`
WHERE TABLE_NAME = "table"
)
Make sure you understand that pruning works with the above query:
Upvotes: 1
Reputation: 1978
A possibility is to use date parameters and make a query like the following:
SELECT *
FROM `dataset.table`
WHERE DATE(_PARTITION_TIME)>= PARSE_DATE("%Y%m%d", @DS_START_DATE)
This does not precisely the answer, but with a date range defaulted to "yesterday to today" data you effectively prune your table to only recent partitions. In the case that the data is irregular as you mention it, users still have the possibility to manually extend the date range until they find the data.
In parallel, you can also add the following custom query to your data sources:
SELECT
MAX(SAFE.PARSE_DATE('%Y%m%d',partition_id)) AS latest_available_partition
FROM `dataset.INFORMATION_SCHEMA.PARTITIONS`
WHERE TABLE_NAME = "table"
and display it into a table to provide the information to users.
Indeed, this workaround implies that you trust your end users not to play too much with the date range.
Upvotes: 2