Yosua Michael
Yosua Michael

Reputation: 193

Create a clustered table in BigQuery from existing table with _PARTITIONTIME

I am trying to make a new clustered table, db.new_table, that have the same data and schema as an existing table, db.old_table, in BigQuery. The existing table have a pseudo column _PARTITIONTIME, and I would like the new table to have this _PARTITIONTIME pseudo column as well.

I have tried using DDL, with query like:

CREATE TABLE `db.new_table`
PARTITION BY DATE(_PARTITIONTIME)
CLUSTER BY field1, field2
AS SELECT * FROM `db.old_table`
WHERE _PARTITIONTIME > '1990-01-01'

However it failed because we cannot use PARTITION BY DATE(_PARTITIONTIME) followed by AS SELECT .... as stated in https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language

Is there a methods to do this? (create a new clustered table with the same exact schema and data from an old table partitioned by pseudo column _PARTITIONTIME )

Any answer and comment is appreciated, thanks.


Notes: I can create a similar table without _PARTITIONTIME with query like:

CREATE TABLE `db.new_table`
PARTITION BY partition_date
CLUSTER BY field1, field2
AS SELECT DATE(_PARTITIONTIME) AS partition_date, * FROM `db.old_table`
WHERE _PARTITIONTIME > '1990-01-01'

However since a lot of things in the system depend on db.old_table, the change in partition field from _PARTITIONTIME to partition_date would cause a lot of query changes... Therefore it would be much preferable if we can create the clustered table with exactly same schema and data.

Upvotes: 2

Views: 7217

Answers (1)

saifuddin778
saifuddin778

Reputation: 7277

You can just pre-create your day-partitioned, clustered table (on whatever fields) named db.new_table using either BQ UI or bq command.

Once the table is there, you can populate "for each day" as:

bq query --allow_large_results --append_table --noflatten_results --destination_table 'db.new_table$19900101' "select field1, field2, field3 from db.old_table where _PARTITIONTIME = '1990-01-01'";

Notice two things:

  • You have to run this query for each day separately (which will cost you pretty much the same so don't worry about it).
  • The db.new_table$19900101 points to the partition of 1990-01-01 in db.new_table.

Upvotes: 1

Related Questions