Mustafa Shabib
Mustafa Shabib

Reputation: 740

Update with "not in" on huge table in SQL Server 2005

I have a table with around 115k rows. Something like this:

Table: People
Column: ID PRIMARY KEY INT IDENTITY NOT NULL
Column: SpecialCode NVARCHAR(255) NULL
Column: IsActive BIT NOT NULL

Initially, I had an index defined like so:

PK_IDX (clustered) -- clustered index on primary key
IDX_SpecialCode (non clustered, non-unique) -- index on the SpecialCode column

And I'm doing an update like so:

Update People set IsActive = 0 
Where SpecialCode not in ('...enormous list of special codes....')

This enormous list is essentially 99% of the users in the table.

This update takes forever on my server. As a test I trimmed the list of special codes in the "not in" clause to something like 1% of the users in the table, and my execution plan ends up using an INDEX SCAN on the PK_IDX index instead of the IDX_SpecialCode index that I thought it'd use.

So, I thought that maybe I needed to modify the IDX_SpecialCode so that it included the column "IsActive" in it. I did so and I still see the execution plan defaulting to the PK_IDX index scan and my query still takes a very long time to run.

So - what is the more correct way to do an update of this nature? I have the list of user's I want to exclude from the update, but was trying to avoid loading all employees special codes from the database, filtering out those not in my list on my application side, and then running my query with an in clause, which will be a much much smaller list in my actual usage.

Thanks

Upvotes: 3

Views: 2448

Answers (4)

Stuart Blackler
Stuart Blackler

Reputation: 3772

With SQL-Server indexes are ignored when you use the NOT clause. That is why you are seeing the execution plan ignoring your index. <- Ref: page 6. MCTS Exam 70-433 Database Development SQL 2008 (I'm reading it at the moment)

It might be worth taking a look at Full text indexes although I don't know whether the same will happen with that (I haven't got access to a box with it set up to test at the moment)

hth

Upvotes: 1

Rafael Colucci
Rafael Colucci

Reputation: 6078

I think that the problem is your SpecialCode NVARCHAR(255). Strings comparison in Sql Server are very slow. Consider change your query to work with the IDs. And also, try to avoid the NVarchar. if dont care about Unicode, use Varchar instead.

Also, check your database collation to see if it matches the instance collation. Make sure you are not having hard disk performance issues.

Upvotes: 0

JNK
JNK

Reputation: 65197

If you have the employees you want to exclude, why not just populate an indexed table with those PK_IDs and do a:

Update People 
set IsActive = 0 
Where NOT EXISTS (SELECT NULL
                  FROM lookuptable l
                  WHERE l.PK = People.PK)

You are getting index scans because SQL Server is not stupid, and realizes that it makes more sense to just look at the whole table instead of checking for 100 different criteria one at a time. If your stats are up to date the optimizer knows about how much of the table is covered by your IN statement and will do a table or clustered index scan if it thinks it will be faster.

Upvotes: 3

tofarr
tofarr

Reputation: 7851

Is there any way you could use the IDs of the users you wish to exclude instead of their code - even on indexed values comparing ids may be faster than strings.

Upvotes: 0

Related Questions