Reputation: 1777
I would like to run this query about once every 5 minutes to be able to run an incremental query to MERGE
to another table.
SELECT MAX(timestamp) FROM dataset.myTable
-- timestamp is of type TIMESTAMP
My concern is that will do a full scan of myTable
on a regular basis.
What are the best practices for optimizing this query? Will partitioning help even if the SELECT MAX
doesn't extract the date from the query? Or is it just the columnar nature of BigQuery will make this optimal?
Thank you.
Upvotes: 1
Views: 779
Reputation: 1978
What you can do is, instead of querying your table directly, query the INFORMATION_SCHEMA.PARTITIONS table within your dataset. Doc here.
You can for instance go for:
SELECT LAST_MODIFIED_TIME
FROM `project.dataset.INFORMATION_SCHEMA.PARTITIONS`
WHERE TABLE_NAME = "myTable"
The PARTITIONS table hold metadata at the rate of one record for each of your partitions. It is therefore greatly smaller than your table and it's an easy way to cut your query costs. (it is also much faster to query).
Upvotes: 2