Reputation: 85
I've been trying to dynamically select from specific tables to reduce the total processed data. Using _TABLE_SUFFIX seems to work when the filters are set manually as follows
Note that declared variables aren't an option here, this is a scheduled query using the BigQuery web UI, and using declared variables seems to prevent saving the output table.
SELECT * FROM SOME_TABLE
WHERE _TABLE_SUFFIX BETWEEN "20200101" and FORMAT_DATE("%Y%m%d", CURRENT_DATE())
>>> 50MB PROCESSED
However, dynamically setting _TABLE SUFFIX based off some other table seems to do a full table scan. Note that the selected value is the same here.
SELECT * FROM SOME_TABLE
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", (SELECT MAX(date) FROM OTHER TABLE)) and FORMAT_DATE("%Y%m%d", CURRENT_DATE())
>>> 350mb PROCESSED
Upvotes: 0
Views: 2203
Reputation: 59375
If your tables were clustered, this would not be a problem at all.
See:
Now, if you want to limit the amount of data queried for these tables, you can use scripting:
DECLARE begin_date STRING
DEFAULT (FORMAT_DATE("%Y%m%d", (SELECT MAX(date) FROM OTHER TABLE)));
SELECT * FROM SOME_TABLE
WHERE _TABLE_SUFFIX BETWEEN begin_date and FORMAT_DATE("%Y%m%d", CURRENT_DATE())
Upvotes: 1