Reputation: 193
I am trying to make a new clustered table, db.new_table
, that have the same data and schema as an existing table, db.old_table
, in BigQuery. The existing table have a pseudo column _PARTITIONTIME
, and I would like the new table to have this _PARTITIONTIME
pseudo column as well.
I have tried using DDL, with query like:
CREATE TABLE `db.new_table`
PARTITION BY DATE(_PARTITIONTIME)
CLUSTER BY field1, field2
AS SELECT * FROM `db.old_table`
WHERE _PARTITIONTIME > '1990-01-01'
However it failed because we cannot use PARTITION BY DATE(_PARTITIONTIME)
followed by AS SELECT ...
. as stated in https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language
Is there a methods to do this? (create a new clustered table with the same exact schema and data from an old table partitioned by pseudo column _PARTITIONTIME )
Any answer and comment is appreciated, thanks.
Notes: I can create a similar table without _PARTITIONTIME with query like:
CREATE TABLE `db.new_table`
PARTITION BY partition_date
CLUSTER BY field1, field2
AS SELECT DATE(_PARTITIONTIME) AS partition_date, * FROM `db.old_table`
WHERE _PARTITIONTIME > '1990-01-01'
However since a lot of things in the system depend on db.old_table
, the change in partition field from _PARTITIONTIME
to partition_date
would cause a lot of query changes... Therefore it would be much preferable if we can create the clustered table with exactly same schema and data.
Upvotes: 2
Views: 7217
Reputation: 7277
You can just pre-create your day-partitioned, clustered table (on whatever fields) named db.new_table
using either BQ UI or bq
command.
Once the table is there, you can populate "for each day" as:
bq query --allow_large_results --append_table --noflatten_results --destination_table 'db.new_table$19900101' "select field1, field2, field3 from db.old_table where _PARTITIONTIME = '1990-01-01'";
Notice two things:
db.new_table$19900101
points to the partition of 1990-01-01
in db.new_table
.Upvotes: 1