Optimize SELECT MAX(timestamp) query

Question

I would like to run this query about once every 5 minutes to be able to run an incremental query to MERGE to another table.

SELECT MAX(timestamp) FROM dataset.myTable

-- timestamp is of type TIMESTAMP

My concern is that will do a full scan of myTable on a regular basis.

What are the best practices for optimizing this query? Will partitioning help even if the SELECT MAX doesn't extract the date from the query? Or is it just the columnar nature of BigQuery will make this optimal?

Thank you.

Cylldby · Accepted Answer

What you can do is, instead of querying your table directly, query the INFORMATION_SCHEMA.PARTITIONS table within your dataset. Doc here.

You can for instance go for:

SELECT LAST_MODIFIED_TIME
FROM `project.dataset.INFORMATION_SCHEMA.PARTITIONS`
WHERE TABLE_NAME = "myTable"

The PARTITIONS table hold metadata at the rate of one record for each of your partitions. It is therefore greatly smaller than your table and it's an easy way to cut your query costs. (it is also much faster to query).

Optimize SELECT MAX(timestamp) query

Answers (1)

Related Questions