Dave Reeck
Dave Reeck

Reputation: 11

How can I join INFORMATION_SCHEMA metadata with wildcard table syntax like _TABLE_SUFFIX, so I can get results by table_name in Google Big Query

I have a Google Big Query dataset that contains tables using snapshots (for example: I see a table named .system_users, and there are actually multiple tables named in the system_users[YYYYMMDD] format).

I want to query all of the snapshots for unique values in the column "_schema_push_date", and display the full name of the table. The end result I'm looking for this:

Row    | table_name               | _schema_push_date 
1      | system_users20200101     | 2020-01-01 09:51:29.251 UTC
2      | system_users20200102     | 2020-01-02 08:53:04.017 UTC

I am using

'''WHERE REGEXP_CONTAINS(_TABLE_SUFFIX, r"202001([0-2])")'''

to limit table the tables queried, but I think I also need to join _TABLE_SUFFIX against INFORMATION_SCHEMA.TABLES.TABLE_NAME to get table names into the results.

it is possible to join _TABLE_SUFFIX with INFORMATIOIN_SCHEMA data?

Upvotes: 0

Views: 598

Answers (1)

Dave Reeck
Dave Reeck

Reputation: 11

To include the name of a table when using Wildcard syntax in GBQ, you must add _TABLE_SUFFIX to your select statement and alias it. For example:

  select  _TABLE_SUFFIX as my_table_name, email, count(1)
  FROM `my_project.my_dataset.*`
  WHERE REGEXP_CONTAINS(_TABLE_SUFFIX, r"202005[0-3]*")
  GROUP BY my_table_name, email

Will produce results like

my_table_name     | email        | f0
my_first_table    | [email protected] | 2345

Here's an interesting link to other pseudo-columns: http://bigdata.freeideas.cz/subdom/bigdata/2017/05/30/bigquery-cheat-sheet-standard-sql-meta-tables-pseudo-columns/

Upvotes: 1

Related Questions