TSQL: How to optimize this update query

Question

I have been working to optimize a procedure created by another programmer at my work by re-writting the entire query to use block statements rather than cursors. I have run into a problem however. I let my procedure run for an hour and it never finished (whereas the cursor procedure finished in 30 minutes). I started debugging and found it never getting past the following query:

    -- Create our temp table
    create table #tData(
        First varchar(50) null,
        Middle varchar(50) null,
        Last varchar(50) null,
        Address varchar(50) null,
        Address2 varchar(50) null,
        City varchar(30) null,
        State varchar(2) null,
        Zip varchar(20) null,
        Phone varchar(20) null,
        SSN varchar(20) null,
        DOB varchar(15) null,
        Gender varchar(1) null,
        cID int null,
        pNum varchar(50) null,
        POFlag bit null,
        MatchFound bit null,
        Status varchar(10) null
    )

    -- Index the temp table
    create clustered index IX_tData_Clust on #tData(First, Middle, Last, Address, SSN, DOB)
    create index IX_tData on #tData(First, Middle, Last, Address, SSN, DOB)

    -- Look for matches
    update 
        #tData 
    set 
        #tData.MatchFound = 1,
        #tData.cID = ClientDatabase.cID,
        #tData.pNum = ClientDatabase.pNum
    from
        #tData,
        ClientDatabase
    where
        #tData.Status = 'ACTIVE'
        and (
                (
                    #tData.SSN is not null
                    and ClientDatabase.SSN is not null
                    and #tData.SSN = ClientDatabase.SSN
                ) 
                or 
                (
                    #tData.First is not null
                    and ClientDatabase.FirstName is not null
                    and #tData.Last is not null
                    and ClientDatabase.LastName is not null
                    and #tData.Address is not null
                    and ClientDatabase.AddressLine1 is not null
                    and #tData.First = ClientDatabase.FirstName
                    and #tData.Last = ClientDatabase.LastName
                    and #tData.Address = ClientDatabase.AddressLine1
                )
            )

Basically the above query has a temp table (#tData) of data that was bulk inserted from a text file. It is then trying to find matches in the ClientDatabase (which has over 5 million entries). Once it has found a match, it updates the MatchFound, cID, and pNum cells of the table row. I have tested it with small insert files (5 rows) and it seems to complete just fine. When I get into the thousands, like most of our files will be, it just poops out.

I did some further research and debugging and found the machine it is currently running on is being maxed on all 6 cores and it appears to be just hanging after that point until the procedure is forced to quit.

Does anyone have any suggestions on how I might be able to optimize the above query or perhaps how to make it so SQL can better manage the query? Any help is much appreciated.

TT. · Accepted Answer

Since the first part in your WHERE clause reads:

where
    #tData.Status = 'ACTIVE'
    -- ...

You need to put an INDEX on the Status field.

Second, I would break the UPDATE statement in two statements, one for each OR-part. Also remove the NULL checks because when you finally compare the fields, the fields cannot be null for the condition to be true (unless you have ANSI_NULLS set to OFF, but I doubt that):

update 
-- snipped for brevity 
where
    #tData.Status = 'ACTIVE'
    and #tData.SSN = ClientDatabase.SSN

Plus

update 
-- snipped for brevity 
where
    #tData.Status = 'ACTIVE'
    and #tData.First = ClientDatabase.FirstName
    and #tData.Last = ClientDatabase.LastName
    and #tData.Address = ClientDatabase.AddressLine1

Third, now that things have cleared up, you will see that these UPDATE statements are easier to read using new-school JOIN statements (read Bad habits to kick : using old-style JOINs).

update 
    #tData 
set 
    #tData.MatchFound = 1,
    #tData.cID = ClientDatabase.cID,
    #tData.pNum = ClientDatabase.pNum
from
    #tData
    INNER JOIN ClientDatabase ON
        #tData.SSN = ClientDatabase.SSN
where
    #tData.Status = 'ACTIVE'

Plus

update 
    #tData 
set 
    #tData.MatchFound = 1,
    #tData.cID = ClientDatabase.cID,
    #tData.pNum = ClientDatabase.pNum
from
    #tData
    INNER JOIN ClientDatabase ON
        #tData.First = ClientDatabase.FirstName
        and #tData.Last = ClientDatabase.LastName
        and #tData.Address = ClientDatabase.AddressLine1
where
    #tData.Status = 'ACTIVE'

Fourth: Since the lookup for the match is in table ClientDatabase I think you need indices in ClientDatabase. One for SSN lookup, one for the FirstName + LastName + AddressLine1 lookup.

Inspecting the Actual Execution Plan will most likely reveal that. If such indices do no exist it will likely result in several table scans which will wreck performance.

create nonclustered index IX_ClientDatabase_SSN on ClientDatabase(SSN);
create nonclustered index IX_ClientDatabase_Name_Address on ClientDatabase(FirstName,LastName,AddressLine1);

TSQL: How to optimize this update query

Answers (2)

Related Questions