Skice
Skice

Reputation: 461

How do I get the last update time of a sequence of tables in BigQuery?

A BigQuery best practice is to split timeseries in daily tables (as "NAME_yyyyMMdd") and then use Table Wildcards to query one or more of these tables.

Sometimes it is useful to get the last update time on a certain set of data (i.e. to check correctness of the ingestion procedure). How do I get the last update time over a set of tables organized like that?

Upvotes: 3

Views: 5693

Answers (2)

Stas_Shch
Stas_Shch

Reputation: 26

SELECT * FROM project_name.data_set_name.INFORMATION_SCHEMA.PARTITIONS where table_name='my_table';

Solution for Google

Upvotes: 1

Skice
Skice

Reputation: 461

A good way to achieve that is to use the __TABLES__ meta-table. Here is a generic query I use in several projects:

SELECT
  MAX(last_modified_time) LAST_MODIFIED_TIME,
  IF(REGEXP_MATCH(RIGHT(table_id,8),"[0-9]{8}"),LEFT(table_id,LENGTH(table_id) - 8),table_id) AS TABLE_ID
FROM
  [my_dataset.__TABLES__]
GROUP BY
  TABLE_ID

It will return the last update time of every table in my_dataset. For tables organized with a daily-split structure, it will return a single value (the update time of the latest table), with the initial part of their name as TABLE_ID.

Upvotes: 3

Related Questions