sethu
sethu

Reputation: 331

I want to reduce the query cost in bigquery while performing the delete operation

In detail,

I have a table in bigquery called tableA which has partitioned column called date_key(type integer)(Size is 500 GB) i want to delete the data based on date_key column on daily basis to maintain the data quality.

Here is the key point when i try to perform below operation:

delete from tableA where date_key in (20200729);

the query cost is 2.5 gb hence I am performing delete operation on partitioned column and I am good with this

So here comes the problem date key should be dynamically passed so I am writing a sub query to pass the date_key as shown below

delete from tableA where date_key in (select distinct date_key in stg_tableA)--which will give you 2020029 as output from the subquery

In this case the query cost is 500 GB Strange do not know why?

I have tried setting the variables then also its not working code:

--declare a,b int64;

DECLARE a DEFAULT (select distinct date_key from stg_tableA);

delete from TableA where date_key in(a)

enter image description here

I have tried in this way as it is working when i hard code the date_key value --declare a,b int64;

DECLARE a DEFAULT 20200727;

delete from tableA where date_key in(a)

enter image description here

Upvotes: 0

Views: 560

Answers (1)

Hua Zhang
Hua Zhang

Reputation: 1551

Declare a variable should work. The estimated cost you saw are from dry-run the script. And in dry-run all script variables will not be set, and thus partition pruning doesn't happen. This is by design. When you actually run the script the variable will be set to the value of the first query and partition pruning should happen in the second query.

Upvotes: 0

Related Questions