A H
A H

Reputation: 2570

BigQuery querying _TABLE_SUFFIX in views with wildcard tables

Summarised Question:

How can I query a view that is based off a wildcard table using _TABLE_SUFFIX as a filter, rather than a column (which would query all the tables)?

e.g.

 SELECT Name, date, weight 
 FROM `my_view`
 WHERE _TABLE_SUFFIX >= '2020-01-01'

rather than

 SELECT Name, date, weight 
 FROM `my_view`
 WHERE date >= '2020-01-01'

Detailed Question:

Say I have a table bq.school.weights_20191231 with the following format

| Name  | Date       | Weight |
|-------|------------|--------|
| Bob   | 2019-12-31 | 55kg   |
| Alex  | 2019-12-31 | 95kg   |
| ...   | ...        | ..     |
| Sandy | 2019-12-31 | 43kg   |

and a table bq.school.weights_20200101

| Name  | Date       | Weight |
|-------|------------|--------|
| Bob   | 2020-01-01 | 60kg   |
| Alex  | 2020-01-01 | 100kg  |
| ...   | ...        | ..     |
| Sandy | 2020-01-01 | 40kg   |

And I create a view based off the base table bq.school.weights_* called weights_view, e.g.

SELECT Name, Date, Weight 
FROM `bq.school.weights_*`

How can I query weights_view using _TABLE_SUFFIX to filter the date, rather than using WHERE DATE = "2020-01-01" (which would query all the tables)?

Upvotes: 1

Views: 5420

Answers (3)

user15164195
user15164195

Reputation: 1

You can do it directly in the where clause

SELECT Name, Weight FROM bq.school.weights_* WHERE _table_suffix >= "20200101"

Upvotes: 0

A H
A H

Reputation: 2570

Solution

Similar to the BigQuery Date-Partitioned Views question, you need to expose the _TABLE_SUFFIX column, and then query off that, e.g.

SELECT Name, Weight, _TABLE_SUFFIX AS the_date
FROM `bq.school.weights_*

and then query the view with

SELECT *
FROM `weights_view`
WHERE the_date = "20200101"

Upvotes: 1

Alexandre Moraes
Alexandre Moraes

Reputation: 4032

In addition to @AH's answer, you can simply reference the _TABLE_SUFFIX within your WHERE statement without having to select it and query the view.

I must point that, according to the documentation, the _TABLE_SUFFIX pseudo column contains the values matched by the table wildcard. Also, it does not need to be within the SELECT clause. Below there is an example using a public dataset, it scans two tables gsod1940 and gsod1944,

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod*`
WHERE
  ( _TABLE_SUFFIX = '1940'
    OR _TABLE_SUFFIX = '1944' )
ORDER BY
  max DESC

Notice that there is a pattern within the table's names noaa_gsod.gsod which is followed by a year. Also, pay attention that the pseudo column _TABLE_SUFFIX is not selected. Although you can select if you desire.

Lastly, I must stress that there are limitations for using wildcards with _TABLE_SUFFIX, here:

  • The wildcard table functionality does not support views. If the wildcard table matches any view in the dataset, the query returns an error. This is true whether or not your query contains a WHERE clause on the _TABLE_SUFFIX pseudo column to filter out the view.

  • Currently, cached results are not supported for queries against multiple tables using a wildcard even if the Use Cached Results option is checked. If you run the same wildcard query multiple times, you are billed for each query. Wildcard tables support native BigQuery storage only. You cannot use wildcards when querying an external table or a view.

  • Queries that contain Data Manipulation Language (DML) statements cannot use a wildcard table as the target of the query. For example, a wildcard table may be used in the FROM clause of an UPDATE query, but a wildcard table cannot be used as the target of the UPDATE operation.

  • Wildcard queries are not supported for tables protected by customer-managed encryption keys (CMEK).

Upvotes: 1

Related Questions