Best approach to improve query performance of updating large table contents

I have to perform Update of a relatively very large table (80M of records) Invoice_Payment. It should update data from another table Invoice_Payment_updated which is 10%-15% of Invoice_Payment in row count. To illustrate please take a look at the following demo tables:

   Invoice_Payment                         Invoice_Payment_updated
   ---------------                         -----------------------                    
   Customer_id   Invoice_no                Id   Cust_id   Invoice_no      
     10          10100001                  1     10       20200100        
     11          10100002                  2     11       20200101        
     12          10100003
     13          10100004

I know Merge is usually used to perform an UPSERT and it takes several times longer to execute than equivalent Update statement. But in comparison, there are some cases that a normal update statement with multiple sub-queries gives lower performance.

MERGE INTO Invoice_Payment ip
USING (SELECT ipu.Cust_id, ipu.Invoice_no from Invoice_Payment_updated ipu
       INNER JOIN Invoice_Payment ip ON ip.Customer_id = ipu.Cust_id
       WHERE ipu.Cust_id = ip.Customer_id and ipu.Invoice_no <> ip.Invoice_no) t
ON (ip.Customer_id = t.Cust_id)
WHEN MATCHED THEN
UPDATE SET ip.Invoice_no = t.Invoice_no;

To improve performance, I can batch up the updates using ROWCOUNT, but that won't speed up the execution, it'll only help with reducing overall locking.

Following simple Update statement which returns same output:

UPDATE Invoice_Payment 
SET Invoice_no = (SELECT ipu.Invoice_no
                  FROM Invoice_Payment_updated ipu
                  WHERE ipu.Cust_id = Invoice_Payment.Customer_id 
                  AND ipu.Invoice_no <> Invoice_Payment.Invoice_no)

WHERE EXISTS (SELECT 1
              FROM Invoice_Payment_updated ipu
              WHERE ipu.Cust_id = Invoice_Payment.Customer_id
              AND ipu.Invoice_no <> Invoice_Payment.Invoice_no);

Idea to use SQL Merge and Update is a very clever one but I heard that both of them fails in performance issues when I need to update many records (i.e. over 75M) in a big and wide table. Furthermore, recreating the full table is a lot of IO load, not to mention it'll take up a lot of space to basically have the table stored several times temporarily because of using subqueries.

Another approach to resolve this issue using temporary table:

CREATE TABLE tmp (
    Cust_id int,
    Invoice_no int);

INSERT INTO tmp_stage VALUES
(SELECT ipu.Cust_id, ipu.Invoice_no FROM Invoice_Payment_updated ipu
 INNER JOIN Invoice_Payment ip ON ip.Customer_id = ipu.Cust_id
 WHERE ipu.Cust_id = ip.Customer_id and ipu.Invoice_no <> ip.Invoice_no);

UPDATE (SELECT tmp.Cust_id, ip.Customer_id, tmp.Invoice_no, tgt.Invoice_no
        FROM tmp INNER JOIN Invoice_Payment ip
        ON tmp.Cust_id = ip.Customer_id)
SET tmp.Invoice_no = ip.Invoice_no;

I want to figure out which one is better to use in case of multiple Subqueries?

Any thoughts are welcome and a totally different solution to the original problem is much appreciated.

Upvotes: 2

Views: 121

Answers (1)

Anton
Anton

Reputation: 2882

UPDATE i
SET i.Invoice_no = io.Invoice_no
FROM Invoice_Payment i
   INNER JOIN Invoice_Payment_updated io on i.Customer_id = io.cust_id
WHERE i.Invoice_no <> iu.Invoice_no    -- assuming Invoice_no cannot be NULL

If that update takes too much time, add WHILE loop and update TOP (10000) till @@ROWCOUNT = 0. Batching mode CAN improve the performance.

Upvotes: 2

Related Questions