Reputation: 1902
I have a SQL Server 2008 R2 database table with 12k address records where I am trying to filter out duplicate phone numbers and flag them using the following query
SELECT a1.Id, a2.Id
FROM Addresses a1
INNER JOIN Addresses a2 ON a1.PhoneNumber = a2.PhoneNumber
WHERE a1.Id < a2.Id
Note: I realize that there is another way to solve this problem by using EXISTS
, but this is not part of the discussion.
The table has a Primary Key on the ID field with a Clustered Index, the fragmentation level is 0 and the phone number field is not null and has about 130 duplicates out of the 12k records. To make sure it is not a server or database instance issue I ran it on 4 different systems.
Execution of the query takes several minutes, sometimes several hours. After trying almost everything as one of my last steps I removed the Primary Key and ran the query without it and voila it executed in under 1 second. I added the Primary Key back and it still ran in under one second.
Does anybody have an idea what is causing this problem?
Is it possible that the primary key gets somehow corrupted?
EDIT: My apologies I had a couple of typos in the Sql Query
Upvotes: 1
Views: 1179
Reputation: 432331
Out of data statistics. Dropping and recreating the PK will give up fresh statistics.
Too late now, but I'd have suggest running sp_updatestats to see what happened.
If you backup and restore a database onto different systems, statistics follow the data
I'd suspect a different plan too after non-indexed (I guess) columns PhoneNumber and CCAPhoneN
Upvotes: 3
Reputation: 65177
I'm guessing there are no indexes on PhoneNumber
or PhoneNo
.
You are joining on these fields, but if they aren't indexed it's forcing TWO table scans, one for each instance of the table in the query, then probably doing a hash match to find matching records.
Next step - get an exec plan and see what the pain points are.
Then, add indexes to those fields (assuming you see a Clustered Index Scan) and see if that fixes it.
I think your other issue is a red herring. The PK likely has nothing to do with it, but you may have gotten page caching (did you drop the buffers and clear the cache between runs?) which made the later runs faster.
Upvotes: 2