Dynamic table suffix with BigQuery does not limit processed data

Question

I've been trying to dynamically select from specific tables to reduce the total processed data. Using _TABLE_SUFFIX seems to work when the filters are set manually as follows

Note that declared variables aren't an option here, this is a scheduled query using the BigQuery web UI, and using declared variables seems to prevent saving the output table.

SELECT * FROM SOME_TABLE
WHERE _TABLE_SUFFIX BETWEEN "20200101" and FORMAT_DATE("%Y%m%d", CURRENT_DATE())

>>> 50MB PROCESSED

However, dynamically setting _TABLE SUFFIX based off some other table seems to do a full table scan. Note that the selected value is the same here.

SELECT * FROM SOME_TABLE
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", (SELECT MAX(date) FROM OTHER TABLE)) and FORMAT_DATE("%Y%m%d", CURRENT_DATE())

>>> 350mb PROCESSED

Felipe Hoffa · Accepted Answer

If your tables were clustered, this would not be a problem at all.

See:

https://stackoverflow.com/a/60895770/132438

Now, if you want to limit the amount of data queried for these tables, you can use scripting:

DECLARE begin_date STRING 
  DEFAULT (FORMAT_DATE("%Y%m%d", (SELECT MAX(date) FROM OTHER TABLE)));

SELECT * FROM SOME_TABLE
WHERE _TABLE_SUFFIX BETWEEN begin_date and FORMAT_DATE("%Y%m%d", CURRENT_DATE())

Dynamic table suffix with BigQuery does not limit processed data

Answers (1)

Related Questions