Reputation: 2570
Summarised Question:
How can I query a view that is based off a wildcard table using _TABLE_SUFFIX
as a filter, rather than a column (which would query all the tables)?
e.g.
SELECT Name, date, weight
FROM `my_view`
WHERE _TABLE_SUFFIX >= '2020-01-01'
rather than
SELECT Name, date, weight
FROM `my_view`
WHERE date >= '2020-01-01'
Detailed Question:
Say I have a table bq.school.weights_20191231
with the following format
| Name | Date | Weight |
|-------|------------|--------|
| Bob | 2019-12-31 | 55kg |
| Alex | 2019-12-31 | 95kg |
| ... | ... | .. |
| Sandy | 2019-12-31 | 43kg |
and a table bq.school.weights_20200101
| Name | Date | Weight |
|-------|------------|--------|
| Bob | 2020-01-01 | 60kg |
| Alex | 2020-01-01 | 100kg |
| ... | ... | .. |
| Sandy | 2020-01-01 | 40kg |
And I create a view based off the base table bq.school.weights_*
called weights_view
, e.g.
SELECT Name, Date, Weight
FROM `bq.school.weights_*`
How can I query weights_view
using _TABLE_SUFFIX
to filter the date, rather than using WHERE DATE = "2020-01-01"
(which would query all the tables)?
Upvotes: 1
Views: 5420
Reputation: 1
You can do it directly in the where clause
SELECT Name, Weight
FROM bq.school.weights_*
WHERE _table_suffix >= "20200101"
Upvotes: 0
Reputation: 2570
Solution
Similar to the BigQuery Date-Partitioned Views question, you need to expose the _TABLE_SUFFIX
column, and then query off that, e.g.
SELECT Name, Weight, _TABLE_SUFFIX AS the_date
FROM `bq.school.weights_*
and then query the view with
SELECT *
FROM `weights_view`
WHERE the_date = "20200101"
Upvotes: 1
Reputation: 4032
In addition to @AH's answer, you can simply reference the _TABLE_SUFFIX within your WHERE statement without having to select it and query the view.
I must point that, according to the documentation, the _TABLE_SUFFIX pseudo column contains the values matched by the table wildcard. Also, it does not need to be within the SELECT clause. Below there is an example using a public dataset, it scans two tables gsod1940 and gsod1944,
#standardSQL
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
mo,
da,
year
FROM
`bigquery-public-data.noaa_gsod.gsod*`
WHERE
( _TABLE_SUFFIX = '1940'
OR _TABLE_SUFFIX = '1944' )
ORDER BY
max DESC
Notice that there is a pattern within the table's names noaa_gsod.gsod which is followed by a year. Also, pay attention that the pseudo column _TABLE_SUFFIX is not selected. Although you can select if you desire.
Lastly, I must stress that there are limitations for using wildcards with _TABLE_SUFFIX, here:
The wildcard table functionality does not support views. If the wildcard table matches any view in the dataset, the query returns an error. This is true whether or not your query contains a WHERE clause on the _TABLE_SUFFIX pseudo column to filter out the view.
Currently, cached results are not supported for queries against multiple tables using a wildcard even if the Use Cached Results option is checked. If you run the same wildcard query multiple times, you are billed for each query. Wildcard tables support native BigQuery storage only. You cannot use wildcards when querying an external table or a view.
Queries that contain Data Manipulation Language (DML) statements cannot use a wildcard table as the target of the query. For example, a wildcard table may be used in the FROM clause of an UPDATE query, but a wildcard table cannot be used as the target of the UPDATE operation.
Wildcard queries are not supported for tables protected by customer-managed encryption keys (CMEK).
Upvotes: 1