Chandan Bhattad
Chandan Bhattad

Reputation: 371

Overwriting multiple partitions bigquery

say I have a table t1 which is date partitioned on column sdate

I have another table t2 which is also partitioned on column pdate which has data already in some partitions (20200101 to 20200130)

Now, I want to overwrite few partitions ( 20200105 to 20200110) in t2 from data in t1

and I want to do it in a single query and not one query per partition.

Select Query:

SELECT sdate,
c1, c2

FROM `dataset.t1` where sdate between "2020-01-05" and "2020-01-10"

I have seen examples where we can specify destination table like t2$20200102, but that will overwrite only a single partition.

How can we write insert statement to populate selected partitions

Any ideas of how the insert query would look like?

Upvotes: 1

Views: 1954

Answers (1)

Guillem Xercavins
Guillem Xercavins

Reputation: 7058

You can simply use SELECT sdate AS pdate in your INSERT statement:

INSERT dataset.t2 (pdate, c1, c2) 
  SELECT sdate AS pdate, c1, c2
  FROM dataset.t1
  WHERE sdate BETWEEN "2020-01-05" and "2020-01-10"

Of course, this will append the rows to each target partition so, if you want to overwrite them, you can delete them first with:

DELETE
  dataset.t2
WHERE
  pdate BETWEEN "2020-01-05" AND "2020-01-10"

Obviously, you should be careful when deleting partitions. An extra check can be added if you only want to overwrite dates with data in the source table:

DELETE
  dataset.t2
WHERE
  pdate IN (
    SELECT sdate 
    FROM dataset.t1 
    WHERE sdate BETWEEN "2020-01-05" AND "2020-01-10")

You can also consider using MERGE but then you would need to take into account what happens when matching row by row.

I did some tests with this gist.

Another option is to do a piece-wise SELECT query to break the data into three date ranges and UNION ALL to bring them together.

SELECT * FROM dataset.t2
WHERE pdate < "2020-01-05" OR pdate > "2020-01-10"
UNION ALL
SELECT * FROM dataset.t1
WHERE sdate BETWEEN "2020-01-05" AND "2020-01-10"

This requires specifying a destination table with WRITE_TRUNCATE and runs in a single job but has to essentially re-write the whole table.

Upvotes: 1

Related Questions