Michel Hua
Michel Hua

Reputation: 1777

Optimize SELECT MAX(timestamp) query

I would like to run this query about once every 5 minutes to be able to run an incremental query to MERGE to another table.

SELECT MAX(timestamp) FROM dataset.myTable

-- timestamp is of type TIMESTAMP

My concern is that will do a full scan of myTable on a regular basis.

What are the best practices for optimizing this query? Will partitioning help even if the SELECT MAX doesn't extract the date from the query? Or is it just the columnar nature of BigQuery will make this optimal?

Thank you.

Upvotes: 1

Views: 779

Answers (1)

Cylldby
Cylldby

Reputation: 1978

What you can do is, instead of querying your table directly, query the INFORMATION_SCHEMA.PARTITIONS table within your dataset. Doc here.

You can for instance go for:

SELECT LAST_MODIFIED_TIME
FROM `project.dataset.INFORMATION_SCHEMA.PARTITIONS`
WHERE TABLE_NAME = "myTable"

The PARTITIONS table hold metadata at the rate of one record for each of your partitions. It is therefore greatly smaller than your table and it's an easy way to cut your query costs. (it is also much faster to query).

Upvotes: 2

Related Questions