Reputation: 131
I have been working to optimize a procedure created by another programmer at my work by re-writting the entire query to use block statements rather than cursors. I have run into a problem however. I let my procedure run for an hour and it never finished (whereas the cursor procedure finished in 30 minutes). I started debugging and found it never getting past the following query:
-- Create our temp table
create table #tData(
First varchar(50) null,
Middle varchar(50) null,
Last varchar(50) null,
Address varchar(50) null,
Address2 varchar(50) null,
City varchar(30) null,
State varchar(2) null,
Zip varchar(20) null,
Phone varchar(20) null,
SSN varchar(20) null,
DOB varchar(15) null,
Gender varchar(1) null,
cID int null,
pNum varchar(50) null,
POFlag bit null,
MatchFound bit null,
Status varchar(10) null
)
-- Index the temp table
create clustered index IX_tData_Clust on #tData(First, Middle, Last, Address, SSN, DOB)
create index IX_tData on #tData(First, Middle, Last, Address, SSN, DOB)
-- Look for matches
update
#tData
set
#tData.MatchFound = 1,
#tData.cID = ClientDatabase.cID,
#tData.pNum = ClientDatabase.pNum
from
#tData,
ClientDatabase
where
#tData.Status = 'ACTIVE'
and (
(
#tData.SSN is not null
and ClientDatabase.SSN is not null
and #tData.SSN = ClientDatabase.SSN
)
or
(
#tData.First is not null
and ClientDatabase.FirstName is not null
and #tData.Last is not null
and ClientDatabase.LastName is not null
and #tData.Address is not null
and ClientDatabase.AddressLine1 is not null
and #tData.First = ClientDatabase.FirstName
and #tData.Last = ClientDatabase.LastName
and #tData.Address = ClientDatabase.AddressLine1
)
)
Basically the above query has a temp table (#tData) of data that was bulk inserted from a text file. It is then trying to find matches in the ClientDatabase (which has over 5 million entries). Once it has found a match, it updates the MatchFound, cID, and pNum cells of the table row. I have tested it with small insert files (5 rows) and it seems to complete just fine. When I get into the thousands, like most of our files will be, it just poops out.
I did some further research and debugging and found the machine it is currently running on is being maxed on all 6 cores and it appears to be just hanging after that point until the procedure is forced to quit.
Does anyone have any suggestions on how I might be able to optimize the above query or perhaps how to make it so SQL can better manage the query? Any help is much appreciated.
Upvotes: 0
Views: 163
Reputation: 16137
Since the first part in your WHERE
clause reads:
where
#tData.Status = 'ACTIVE'
-- ...
You need to put an INDEX
on the Status
field.
Second, I would break the UPDATE
statement in two statements, one for each OR
-part. Also remove the NULL
checks because when you finally compare the fields, the fields cannot be null for the condition to be true (unless you have ANSI_NULLS
set to OFF
, but I doubt that):
update
-- snipped for brevity
where
#tData.Status = 'ACTIVE'
and #tData.SSN = ClientDatabase.SSN
Plus
update
-- snipped for brevity
where
#tData.Status = 'ACTIVE'
and #tData.First = ClientDatabase.FirstName
and #tData.Last = ClientDatabase.LastName
and #tData.Address = ClientDatabase.AddressLine1
Third, now that things have cleared up, you will see that these UPDATE
statements are easier to read using new-school JOIN statements (read Bad habits to kick : using old-style JOINs).
update
#tData
set
#tData.MatchFound = 1,
#tData.cID = ClientDatabase.cID,
#tData.pNum = ClientDatabase.pNum
from
#tData
INNER JOIN ClientDatabase ON
#tData.SSN = ClientDatabase.SSN
where
#tData.Status = 'ACTIVE'
Plus
update
#tData
set
#tData.MatchFound = 1,
#tData.cID = ClientDatabase.cID,
#tData.pNum = ClientDatabase.pNum
from
#tData
INNER JOIN ClientDatabase ON
#tData.First = ClientDatabase.FirstName
and #tData.Last = ClientDatabase.LastName
and #tData.Address = ClientDatabase.AddressLine1
where
#tData.Status = 'ACTIVE'
Fourth: Since the lookup for the match is in table ClientDatabase
I think you need indices in ClientDatabase
. One for SSN
lookup, one for the FirstName + LastName + AddressLine1
lookup.
Inspecting the Actual Execution Plan will most likely reveal that. If such indices do no exist it will likely result in several table scans which will wreck performance.
create nonclustered index IX_ClientDatabase_SSN on ClientDatabase(SSN);
create nonclustered index IX_ClientDatabase_Name_Address on ClientDatabase(FirstName,LastName,AddressLine1);
Upvotes: 1
Reputation: 17354
I dont exactly know why are you updating a temp (#temporary) table but I will give some suggestions.
Updating #temp table is probably a bad idea in your case because you have millions of records to match against and #temp table wont have indexes.
A better approach would be create a table which has indices on it, then insert data into it, then match it against index column in your main table, that way your query will be dramatically faster. So something like this
CREATE #TABLE(colum1 as primary key, columne2,.... columnn) INSERT INTO #TABLE() VALUES()
Now your temp table will be just as fast and your query should be really fast as well. Note the above code is pseudo.
Upvotes: 0