Mani Shankar.S
Mani Shankar.S

Reputation: 39

Bigquery Repartition stages are not happened in the execution plan, when I ran the query for second time

I am trying to benchmark the performance of Delete + Insert query in Bigquery.

myquery looks like

delete from target_table A where <partition_col> = 5 and exists
(Select 1 from staging_table B where
  A.col1 = b.col1  and
  A.col2 = B.col2 and 
  A.col3 = B.col3 and 
  A.col4 = B.col4 and
  B.date >= A.date )

 insert into target_table select * from staging_table

Basically I need to delete the incremental key data from target and insert the incremental data into target FROM STAGING TABLE.

But When I try to run this job for the very first time.

stats are

slot time - 2hr 7min, bytes shuffled 135.59gb, bytes processed 47.5gb

second run/subsequent run stats are

slot time 31min 31sec, bytes shuffled 105.57mb, bytes processed 5.31gb

I can see in execution graph REPARTITION stage is not happened in subsequent run. It happens only in very first run, which inturn reduces the data shuffling in execution graph of subsequent runs.

EDIT :- After each delete, I would re-insert the data from STAGING to TARGET table. I will update one date field in staging table, then will rerun the DELETE statement (just to have the staging table as incremental dataset with one date field updated) First two image --> First delete execution graph and next two for second delete statement

Can someone please explain on this behaviour. Delete first time Delete first time part 2 Delete second time part 1 Delete second time part 2

Upvotes: 0

Views: 189

Answers (0)

Related Questions