Reputation: 11
I have a Google Big Query dataset that contains tables using snapshots (for example: I see a table named .system_users, and there are actually multiple tables named in the system_users[YYYYMMDD] format).
I want to query all of the snapshots for unique values in the column "_schema_push_date", and display the full name of the table. The end result I'm looking for this:
Row | table_name | _schema_push_date
1 | system_users20200101 | 2020-01-01 09:51:29.251 UTC
2 | system_users20200102 | 2020-01-02 08:53:04.017 UTC
I am using
'''WHERE REGEXP_CONTAINS(_TABLE_SUFFIX, r"202001([0-2])")'''
to limit table the tables queried, but I think I also need to join _TABLE_SUFFIX against INFORMATION_SCHEMA.TABLES.TABLE_NAME to get table names into the results.
it is possible to join _TABLE_SUFFIX with INFORMATIOIN_SCHEMA data?
Upvotes: 0
Views: 598
Reputation: 11
To include the name of a table when using Wildcard syntax in GBQ, you must add _TABLE_SUFFIX to your select statement and alias it. For example:
select _TABLE_SUFFIX as my_table_name, email, count(1)
FROM `my_project.my_dataset.*`
WHERE REGEXP_CONTAINS(_TABLE_SUFFIX, r"202005[0-3]*")
GROUP BY my_table_name, email
Will produce results like
my_table_name | email | f0
my_first_table | [email protected] | 2345
Here's an interesting link to other pseudo-columns: http://bigdata.freeideas.cz/subdom/bigdata/2017/05/30/bigquery-cheat-sheet-standard-sql-meta-tables-pseudo-columns/
Upvotes: 1