mrag
mrag

Reputation: 81

Any way to loop through all tables in a dataset in a BigQuery standard SQL script?

I have many daily tables (named by _YYYYMMDD suffix) in a dataset. I need to write a daily Scheduled Query to loop through each daily table, perform a join with another dynamically changing table, and overwrite that day's table all in BigQuery. I was looking into "Scripting in standard SQL" and wondered if there might be a way to iterate over _YYYYMMDD tables in a loop?

https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting

Thanks in advance!

Upvotes: 1

Views: 5998

Answers (1)

Alexandre Moraes
Alexandre Moraes

Reputation: 4032

I would suggest you to use Wildcard tables, which allows you to use some features in Standard SQL with BigQuery. You can read more and see some examples here.

Furthermore, the syntax you are looking for should be similar to this:

SELECT 
#columns that you want
FROM
  `<project-id>.<dataset-id>.<table-prefix>*`

The asterisk represents which tables the select will scan, in this case every table with the same prefix. On the other hand, you can also use _TABLE_SUFFIX to restrict scans of your tables, you can read more bout it here. Also, below there is a sample of how it would look like:

SELECT
  ROUND((max-32)*5/9,1) celsius
FROM
  `bigquery-public-data.noaa_gsod.gsod19*`
WHERE
  _TABLE_SUFFIX = (SELECT SUBSTR(MAX(table_id), LENGTH('gsod19') + 1)
      FROM `bigquery-public-data.noaa_gsod.__TABLES_SUMMARY__`
      WHERE table_id LIKE 'gsod194%')

I hope it helps.

ANSWER UPDATE (after user comments):

There is a method called UPDATE in BigQuery, which allows the user upodate data from certain columns inside a dataset. In you case that might be a appropriate solution. The syntax for using update is as follows:

UPDATE sample_db.UserSessions
SET ip_address = REGEXP_REPLACE(ip_address, r"(\.[0-9]+)$", ".0")
WHERE TRUE

Upvotes: 1

Related Questions