Reputation: 39
I am trying to benchmark the performance of Delete + Insert query in Bigquery.
myquery looks like
delete from target_table A where <partition_col> = 5 and exists
(Select 1 from staging_table B where
A.col1 = b.col1 and
A.col2 = B.col2 and
A.col3 = B.col3 and
A.col4 = B.col4 and
B.date >= A.date )
insert into target_table select * from staging_table
Basically I need to delete the incremental key data from target and insert the incremental data into target FROM STAGING TABLE.
But When I try to run this job for the very first time.
stats are
slot time - 2hr 7min, bytes shuffled 135.59gb, bytes processed 47.5gb
second run/subsequent run stats are
slot time 31min 31sec, bytes shuffled 105.57mb, bytes processed 5.31gb
I can see in execution graph REPARTITION stage is not happened in subsequent run. It happens only in very first run, which inturn reduces the data shuffling in execution graph of subsequent runs.
EDIT :- After each delete, I would re-insert the data from STAGING to TARGET table. I will update one date field in staging table, then will rerun the DELETE statement (just to have the staging table as incremental dataset with one date field updated) First two image --> First delete execution graph and next two for second delete statement
Can someone please explain on this behaviour.
Upvotes: 0
Views: 189