Reputation: 331
In detail,
I have a table in bigquery called tableA which has partitioned column called date_key(type integer)(Size is 500 GB) i want to delete the data based on date_key column on daily basis to maintain the data quality.
Here is the key point when i try to perform below operation:
delete from tableA where date_key in (20200729);
the query cost is 2.5 gb hence I am performing delete operation on partitioned column and I am good with this
So here comes the problem date key should be dynamically passed so I am writing a sub query to pass the date_key as shown below
delete from tableA where date_key in (select distinct date_key in stg_tableA)--which will give you 2020029 as output from the subquery
In this case the query cost is 500 GB Strange do not know why?
I have tried setting the variables then also its not working code:
--declare a,b int64;
DECLARE a DEFAULT (select distinct date_key from stg_tableA);
delete from TableA where date_key in(a)
I have tried in this way as it is working when i hard code the date_key value --declare a,b int64;
DECLARE a DEFAULT 20200727;
delete from tableA where date_key in(a)
Upvotes: 0
Views: 560
Reputation: 1551
Declare a variable should work. The estimated cost you saw are from dry-run the script. And in dry-run all script variables will not be set, and thus partition pruning doesn't happen. This is by design. When you actually run the script the variable will be set to the value of the first query and partition pruning should happen in the second query.
Upvotes: 0