Kev
Kev

Reputation: 119806

Why are my SQL indexes being ignored?

We're having a problem where indexes on our tables are being ignored and SQL Server 2000 is performing table scans instead. We can force the use of indexes by using the WITH (INDEX=<index_name>) clause but would prefer not to have to do this.

As a developer I'm very familiar with SQL Server when writing T-SQL, but profiling and performance tuning isn't my strong point. I'm looking for any advice and guidance as to why this might be happening.

Update:

I should have said that we've rebuilt all indexes and updated index statistics.

The table definition for one of the culprits is as follows:

CREATE TABLE [tblinvoices]
(
    [CustomerID] [int] NOT NULL,
    [InvoiceNo] [int] NOT NULL,
    [InvoiceDate] [smalldatetime] NOT NULL,
    [InvoiceTotal] [numeric](18, 2) NOT NULL,
    [AmountPaid] [numeric](18, 2) NULL 
        CONSTRAINT [DF_tblinvoices_AmountPaid]  DEFAULT (0),
    [DateEntered] [smalldatetime] NULL 
        CONSTRAINT [DF_tblinvoices_DateEntered]  DEFAULT (getdate()),
    [PaymentRef] [varchar](110),
    [PaymentType] [varchar](10),
    [SyncStatus] [int] NULL,
    [PeriodStart] [smalldatetime] NULL,
    [DateIssued] [smalldatetime] NULL 
        CONSTRAINT [DF_tblinvoices_dateissued]  DEFAULT (getdate()),
    CONSTRAINT [PK_tblinvoices] PRIMARY KEY NONCLUSTERED 
    (
        [InvoiceNo] ASC
    ) ON [PRIMARY]
) ON [PRIMARY]

There is one other index on this table (the one we want SQL to use):

CustomerID (Non-Unique, Non-Clustered)

The following query performs a table scan instead of using the CustomerID index:

SELECT 
    CustomerID, 
    Sum(InvoiceTotal) AS SumOfInvoiceTotal, 
    Sum(AmountPaid) AS SumOfAmountPaid 
FROM tblInvoices 
WHERE CustomerID = 2112 
GROUP BY customerID

Updated:

In answer to Autocracy's question, both of those queries perform a table scan.

Updated:

In answer to Quassnoi's question about DBCC SHOW_STATISTICS, the data is:

RANGE_HI_KEY    RANGE_ROWS    EQ_ROWS    DISTINCT_RANGE_ROWS    AVG_RANGE_ROWS
1667            246           454        8                      27.33333
2112            911           3427       16                     56.9375
2133            914           775        16                     57.125

Upvotes: 12

Views: 6231

Answers (12)

Jeff Ferland
Jeff Ferland

Reputation: 18282

I think I just found it. I was reading the comments posted to your question before I noted that the two queries I gave you were expected to cause table scan, and I just wanted the result. That said, it caught my interest when somebody said you had no clustered indexes. I read your SQL create statement in detail, and was surprised to note that was the case. This is why it isn't using your CustomerId index.

Your CustomerId index references your primary key of InvoiceNo. Your primary key, however, isn't clustered, so then you'd have to look in that index to find where the row actually is. The SQL server won't do two non-clustered index lookups to find a row. It'll just table scan.

Make your InvoiceNo a clustered index. We can assume those will generally be inserted in ascending manner, and thus the insertion cost won't be much higher. Your query cost, however, will be much lower. Dollars to donuts, it'll use your index then.


Edit: I like BradC's suggestion as well. It's a common DBA trick. Like he says, though, make that primary clustered anyway since this is the CAUSE of your problem. It is very rare to have a table with no clustered index. Most of the time it isn't used, it's a bad idea. That said, his covering index is an improvement ON TOP OF clustering that should be done.

Upvotes: 2

BradC
BradC

Reputation: 39916

The best thing to do is make the index a covering index by including the InvoiceTotal and AmountPaid columns in the CustomerID index. (In SQL 2005, you would add them as "included" columns". In SQL 2000, you have to add them as additional key columns.) If you do that, I'll guarantee the query optimizer will choose your index*.

Explanation: Indexes seem like they would always be useful, but there is a hidden cost to using a (non-covering) index, and that is the "bookmark lookup" that has to be done to retrieve any other columns that might be needed from the main table. This bookmark lookup is an expensive operation, and is (one possible) reason why the query optimizer might not choose to use your index.

By including all needed columns in the index itself, this bookmark lookup is avoided entirely, and the optimizer doesn't have to play this little game of figuring out if using an index is "worth it".

(*) Or I'll refund your StackOverflow points. Just send a self-addressed, stamped envelope to...

Edit: Yes, if your primary key is NOT a clustered index, then by all means, do that, too!! But even with that change, making your CustomerID index a covering index should increase performance by an order of magnitude (10x or better)!!

Upvotes: 6

Quassnoi
Quassnoi

Reputation: 425301

We're having a problem where indexes on our tables are being ignored and SQL Server 2000 is performing table scans instead.

Despite 4,302 days that have passed since Aug 29, 1997, SQL Server's optimizer has not evolved into SkyNet yet, and it still can make some incorrect decisions.

Index hints are just the way you, a human being, help the artificial intelligence.

If you are sure that you collected statistics and the optimizer is still wrong, then go on, use the hints.

They are legitimate, correct, documented and supported by Microsoft way to enforce the query plan you want.

In your case:

SELECT CustomerID, 
       SUM(InvoiceTotal) AS SumOfInvoiceTotal, 
       SUM(AmountPaid) AS SumOfAmountPaid 
FROM   tblInvoices 
WHERE  CustomerID = 2112 
GROUP BY
       CustomerID

, the optimizer has two choises:

  • Use the index which implies a nested loop over the index along with KEY LOOKUP to fetch the values of InvoiceTotal and AmountPaid
  • Do not use the index and scan all tables rows, which is faster in rows fetched per second, but longer in terms of total row count.

The first method may or may not be faster than the second one.

The optimizer tries to estimate which method is faster by looking into the statistics, which keep the index selectivity along with other values.

For selective indexes, the former method is faster; for non-selective ones, the latter is.

Could you please run this query:

SELECT  1 - CAST(COUNT(NULLIF(CustomerID, 2112)) AS FLOAT) / COUNT(*)
FROM    tlbInvoices

Update:

Since CustomerID = 2112 covers only 1,4% of your rows, you should benefit from using the index.

Now, could you please run the following query:

DBCC SHOW_STATISTICS ([tblinvoices], [CustomerID])

, locate two adjacents rows in the third resultset with RANGE_HI_KEY being less and more than 2112, and post the rows here?

Update 2:

Since the statistics seem to be correct, we can only guess why the optimizer chooses full table scan in this case.

Probably (probably) this is because this very value (2112) occurs in the RANGE_HI_KEY and the optimizer sees that it's unusually dense (3427 values for 2112 alone against only 911 for the whole range from 1668 to 2111)

Could you please do two more things:

  1. Run this query:

    DBCC SHOW_STATISTICS ([tblinvoices], [CustomerID])
    

    and post the first two resultsets.

    • Run this query:

      SELECT TOP 1 CustomerID, COUNT(*) FROM tblinvoices WHERE CustomerID BETWEEN 1668 AND 2111

    , use the top CustomerID from the query above in your original query:

    SELECT CustomerID, 
           SUM(InvoiceTotal) AS SumOfInvoiceTotal, 
           SUM(AmountPaid) AS SumOfAmountPaid 
    FROM   tblInvoices 
    WHERE  CustomerID = @Top_Customer
    GROUP BY
           CustomerID
    

    and see what plan will it generate.

Upvotes: 5

Eric Sabine
Eric Sabine

Reputation: 1165

I would start testing to see if you can change the primary key to a clustered index. Right now the table is considered a "heap". If you can't do this then I would also consider creating a view with a clustered index but first you'd have to change the "AmountPaid" column to NOT NULL. It already defaults to zero so this might be an easy change. For the view I'd try something similar to this.

SET QUOTED_IDENTIFIER, ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER ON
GO
SET NUMERIC_ROUNDABORT OFF
GO

IF EXISTS 
  (
         SELECT TABLE_NAME
           FROM INFORMATION_SCHEMA.VIEWS 
          WHERE TABLE_NAME = N'CustomerInvoiceSummary'
  )
           DROP VIEW dbo.CustomerInvoiceSummary
GO

CREATE VIEW dbo.CustomerInvoiceSummary WITH SCHEMABINDING
AS

  SELECT a.CustomerID
       , Sum(a.InvoiceTotal) AS SumOfInvoiceTotal
       , Sum(a.AmountPaid)   AS SumOfAmountPaid 
       , COUNT_BIG(*)                     AS CT
    FROM dbo.tblInvoices a
GROUP BY a.CustomerID

GO
CREATE UNIQUE CLUSTERED INDEX CustomerInvoiceSummary_CLI ON dbo.CustomerInvoiceSummary ( CustomerID )
GO

Upvotes: 2

SqlACID
SqlACID

Reputation: 4014

Have you tried

exec sp_recompile tblInvoices

...just to make sure you're not using a cached bad plan?

Upvotes: 1

NeedHack
NeedHack

Reputation: 2993

Have you tried adding the other columns to your index? i.e. InvoiceTotal and AmountPaid.

The idea being that the query will be "covered" by the index, and won't have to refer back to the table.

Upvotes: 2

Mitch Wheat
Mitch Wheat

Reputation: 300529

The most common reasons for indexes to be ignored are

  • Columns involved are not selective enough (optimiser decides tables scans will be faster, due to 'visiting' a large amount of rows)

  • There are a large number of columns involved in SELECT/GROUP BY/ORDER BY and would involve a lookup into the clustered index after using the index

  • Statistics being out of date (or skewed by a large number of inserts or deletes)

Do you have a regular index maintenance job running? (it is quite common for it to be missing in Dev environment).

Upvotes: 3

Jeff Ferland
Jeff Ferland

Reputation: 18282

Several others have pointed out that your database may need the index statistics updated. You may also have such a high percentage of rows in the database that it would be faster to sequentially read the table than to seek across the disk to find every one. SQL Server has a fancy GUI query analyzer that will tell you what the database thinks the cost of various activiites is. You can open that up and see exactly what it was thinking.

We can give you more solid answers if you can give us:

Select * from tblinvoices;
Select * from tblinvoices where CustomerID = 2112;

Use that query analyzer, and update your statistics. One last hint: you can use index hints to force it to use your index if you're sure it's just being stupid after you've done everything else.

Upvotes: 1

Remus Rusanu
Remus Rusanu

Reputation: 294227

Latest post from Kimberly covers exactly this topic: http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Tipping-Point-Query-Answers.aspx

SQL Server uses a cost based optimizer and if the optimizer calculates that the cost of looking up the index keys and then look up the clustered index to retrieve the rest of the columns is higher than the cost of scanning the table, then it will scan the table instead. The 'tipping' point is actually surprisingly low.

Upvotes: 2

gbn
gbn

Reputation: 432210

Are you using "SELECT * FROM ..."? This generally results in scans.

We'd need schema, indexes and sample queries to help more

Upvotes: 0

D&#39;Arcy Rittich
D&#39;Arcy Rittich

Reputation: 171371

Try updating your statistics. These statistics are the basis for the decisions made by the compiler about whether it should use an index or not. They contain information such as cardinality and number of rows for each table.

E.g., if the statistics have not been updated since you did a large bulk import, the compiler may still think the table has only 10 rows in it, and not bother with an index.

Upvotes: 0

Tim C
Tim C

Reputation: 70598

You could also try doing an UPDATE STATISTICS on the table (or tables) involved in the query. Not that I fully understand statistics in SQL, but I do know it is something our DBAs do occasionally (with an weekly job being scheduled to update stats on the larger and frequently changed tables).

SQL Statistics

Upvotes: 0

Related Questions