Reputation: 371
say I have a table t1
which is date partitioned on column sdate
I have another table t2
which is also partitioned on column pdate
which has data already in some partitions (20200101 to 20200130)
Now, I want to overwrite few partitions ( 20200105 to 20200110) in t2
from data in t1
and I want to do it in a single query and not one query per partition.
Select Query:
SELECT sdate,
c1, c2
FROM `dataset.t1` where sdate between "2020-01-05" and "2020-01-10"
I have seen examples where we can specify destination table like t2$20200102
, but that will overwrite only a single partition.
How can we write insert statement to populate selected partitions
Any ideas of how the insert query would look like?
Upvotes: 1
Views: 1954
Reputation: 7058
You can simply use SELECT sdate AS pdate
in your INSERT
statement:
INSERT dataset.t2 (pdate, c1, c2)
SELECT sdate AS pdate, c1, c2
FROM dataset.t1
WHERE sdate BETWEEN "2020-01-05" and "2020-01-10"
Of course, this will append the rows to each target partition so, if you want to overwrite them, you can delete them first with:
DELETE
dataset.t2
WHERE
pdate BETWEEN "2020-01-05" AND "2020-01-10"
Obviously, you should be careful when deleting partitions. An extra check can be added if you only want to overwrite dates with data in the source table:
DELETE
dataset.t2
WHERE
pdate IN (
SELECT sdate
FROM dataset.t1
WHERE sdate BETWEEN "2020-01-05" AND "2020-01-10")
You can also consider using MERGE
but then you would need to take into account what happens when matching row by row.
I did some tests with this gist.
Another option is to do a piece-wise SELECT
query to break the data into three date ranges and UNION ALL
to bring them together.
SELECT * FROM dataset.t2
WHERE pdate < "2020-01-05" OR pdate > "2020-01-10"
UNION ALL
SELECT * FROM dataset.t1
WHERE sdate BETWEEN "2020-01-05" AND "2020-01-10"
This requires specifying a destination table with WRITE_TRUNCATE
and runs in a single job but has to essentially re-write the whole table.
Upvotes: 1