Pruthviraj Shivanna
Pruthviraj Shivanna

Reputation: 121

Is there a metadata table to check if the table in BigQuery is partitioned?

I have the following requirement.

We have 1400 sharded tables in BQ based on specific functional grouping. Maximum of these individual sharded tables are also time-partitioned for DAY.

Smaller tables are not time-partitioned.

I am trying to create views dynamically in BigQuery by doing a UNION of all the sharded tables and writing a WHERE filter condition on _partitiontime.

But the problem is there could be smaller tables which are not time-partitioned, the query fails.

There are many more tables created on daily basis and I cant settle for a static exclusion solution since that would require manually maintaining a file with table names etc. (Apache Beam + BQ uses the data that is coming in and creates new sharded tables for newer functional groups without manual intervention.)

I am trying to exclude the non time-partitioned tables by using bq utility to connect to BQ more than 1000 times to check if a table is time-partitioned.

bq show --format=prettyjson and check the timepartitioning type field.

This is very slow, more than 30 minutes.

I have tried the __TABLES_SUMMARY__, but it does not have the partitioning info.

I have also checked SELECT partition_id from [mydataset.table1$__PARTITIONS_SUMMARY__];, but this works if the table is already partitioned.

Upvotes: 12

Views: 12443

Answers (3)

Gulshan
Gulshan

Reputation: 151

You can write the below query in standard SQL in BQ to get the information

SELECT * 
  FROM `project_id.dataset_name.INFORMATION_SCHEMA.COLUMNS`
  WHERE TABLE_NAME = Table_Name 
    AND is_partitioning_column = "YES"

Upvotes: 15

Eric Samuel
Eric Samuel

Reputation: 1

You can write a function that runs a query like this with try and except, the query will error if the table is not partitioned.

SELECT partition_id FROM [%s.%s$__PARTITIONS_SUMMARY__]' % (dataset_id, table_id)

Upvotes: 0

Adam Lydick
Adam Lydick

Reputation: 1122

Unfortunately, there is no way to determine this dynamically.

As a workaround, I would suggest keeping your partitioned tables and your non-partitioned tables in distinct datasets OR including information about whether the table is partitioned in its name.

You could also maintain a list of table prefixes that are known to be partitioned (or non-partitioned, if that is easier to maintain).

You can also open a feature request on the public issue tracker for BigQuery: https://issuetracker.google.com/issues/new?component=187149&template=0

Upvotes: 1

Related Questions