Reputation: 119806
We're having a problem where indexes on our tables are being ignored and SQL Server 2000 is performing table scans instead. We can force the use of indexes by using the WITH (INDEX=<index_name>)
clause but would prefer not to have to do this.
As a developer I'm very familiar with SQL Server when writing T-SQL, but profiling and performance tuning isn't my strong point. I'm looking for any advice and guidance as to why this might be happening.
Update:
I should have said that we've rebuilt all indexes and updated index statistics.
The table definition for one of the culprits is as follows:
CREATE TABLE [tblinvoices]
(
[CustomerID] [int] NOT NULL,
[InvoiceNo] [int] NOT NULL,
[InvoiceDate] [smalldatetime] NOT NULL,
[InvoiceTotal] [numeric](18, 2) NOT NULL,
[AmountPaid] [numeric](18, 2) NULL
CONSTRAINT [DF_tblinvoices_AmountPaid] DEFAULT (0),
[DateEntered] [smalldatetime] NULL
CONSTRAINT [DF_tblinvoices_DateEntered] DEFAULT (getdate()),
[PaymentRef] [varchar](110),
[PaymentType] [varchar](10),
[SyncStatus] [int] NULL,
[PeriodStart] [smalldatetime] NULL,
[DateIssued] [smalldatetime] NULL
CONSTRAINT [DF_tblinvoices_dateissued] DEFAULT (getdate()),
CONSTRAINT [PK_tblinvoices] PRIMARY KEY NONCLUSTERED
(
[InvoiceNo] ASC
) ON [PRIMARY]
) ON [PRIMARY]
There is one other index on this table (the one we want SQL to use):
CustomerID (Non-Unique, Non-Clustered)
The following query performs a table scan instead of using the CustomerID
index:
SELECT
CustomerID,
Sum(InvoiceTotal) AS SumOfInvoiceTotal,
Sum(AmountPaid) AS SumOfAmountPaid
FROM tblInvoices
WHERE CustomerID = 2112
GROUP BY customerID
Updated:
In answer to Autocracy's question, both of those queries perform a table scan.
Updated:
In answer to Quassnoi's question about DBCC SHOW_STATISTICS
, the data is:
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
1667 246 454 8 27.33333
2112 911 3427 16 56.9375
2133 914 775 16 57.125
Upvotes: 12
Views: 6231
Reputation: 18282
I think I just found it. I was reading the comments posted to your question before I noted that the two queries I gave you were expected to cause table scan, and I just wanted the result. That said, it caught my interest when somebody said you had no clustered indexes. I read your SQL create statement in detail, and was surprised to note that was the case. This is why it isn't using your CustomerId index.
Your CustomerId index references your primary key of InvoiceNo. Your primary key, however, isn't clustered, so then you'd have to look in that index to find where the row actually is. The SQL server won't do two non-clustered index lookups to find a row. It'll just table scan.
Make your InvoiceNo a clustered index. We can assume those will generally be inserted in ascending manner, and thus the insertion cost won't be much higher. Your query cost, however, will be much lower. Dollars to donuts, it'll use your index then.
Edit: I like BradC's suggestion as well. It's a common DBA trick. Like he says, though, make that primary clustered anyway since this is the CAUSE of your problem. It is very rare to have a table with no clustered index. Most of the time it isn't used, it's a bad idea. That said, his covering index is an improvement ON TOP OF clustering that should be done.
Upvotes: 2
Reputation: 39916
The best thing to do is make the index a covering index by including the InvoiceTotal and AmountPaid columns in the CustomerID index. (In SQL 2005, you would add them as "included" columns". In SQL 2000, you have to add them as additional key columns.) If you do that, I'll guarantee the query optimizer will choose your index*.
Explanation: Indexes seem like they would always be useful, but there is a hidden cost to using a (non-covering) index, and that is the "bookmark lookup" that has to be done to retrieve any other columns that might be needed from the main table. This bookmark lookup is an expensive operation, and is (one possible) reason why the query optimizer might not choose to use your index.
By including all needed columns in the index itself, this bookmark lookup is avoided entirely, and the optimizer doesn't have to play this little game of figuring out if using an index is "worth it".
(*) Or I'll refund your StackOverflow points. Just send a self-addressed, stamped envelope to...
Edit: Yes, if your primary key is NOT a clustered index, then by all means, do that, too!! But even with that change, making your CustomerID index a covering index should increase performance by an order of magnitude (10x or better)!!
Upvotes: 6
Reputation: 425301
We're having a problem where indexes on our tables are being ignored and
SQL Server 2000
is performing table scans instead.
Despite 4,302
days that have passed since Aug 29, 1997
, SQL Server
's optimizer has not evolved into SkyNet
yet, and it still can make some incorrect decisions.
Index hints are just the way you, a human being, help the artificial intelligence.
If you are sure that you collected statistics and the optimizer is still wrong, then go on, use the hints.
They are legitimate, correct, documented and supported by Microsoft
way to enforce the query plan you want.
In your case:
SELECT CustomerID,
SUM(InvoiceTotal) AS SumOfInvoiceTotal,
SUM(AmountPaid) AS SumOfAmountPaid
FROM tblInvoices
WHERE CustomerID = 2112
GROUP BY
CustomerID
, the optimizer has two choises:
KEY LOOKUP
to fetch the values of InvoiceTotal
and AmountPaid
rows fetched per second
, but longer in terms of total row count.The first method may or may not be faster than the second one.
The optimizer tries to estimate which method is faster by looking into the statistics, which keep the index selectivity along with other values.
For selective indexes, the former method is faster; for non-selective ones, the latter is.
Could you please run this query:
SELECT 1 - CAST(COUNT(NULLIF(CustomerID, 2112)) AS FLOAT) / COUNT(*)
FROM tlbInvoices
Update:
Since CustomerID = 2112
covers only 1,4%
of your rows, you should benefit from using the index.
Now, could you please run the following query:
DBCC SHOW_STATISTICS ([tblinvoices], [CustomerID])
, locate two adjacents rows in the third resultset with RANGE_HI_KEY
being less and more than 2112
, and post the rows here?
Update 2:
Since the statistics seem to be correct, we can only guess why the optimizer chooses full table scan in this case.
Probably (probably) this is because this very value (2112
) occurs in the RANGE_HI_KEY
and the optimizer sees that it's unusually dense (3427
values for 2112
alone against only 911
for the whole range from 1668
to 2111
)
Could you please do two more things:
Run this query:
DBCC SHOW_STATISTICS ([tblinvoices], [CustomerID])
and post the first two resultsets.
Run this query:
SELECT TOP 1 CustomerID, COUNT(*) FROM tblinvoices WHERE CustomerID BETWEEN 1668 AND 2111
, use the top CustomerID
from the query above in your original query:
SELECT CustomerID,
SUM(InvoiceTotal) AS SumOfInvoiceTotal,
SUM(AmountPaid) AS SumOfAmountPaid
FROM tblInvoices
WHERE CustomerID = @Top_Customer
GROUP BY
CustomerID
and see what plan will it generate.
Upvotes: 5
Reputation: 1165
I would start testing to see if you can change the primary key to a clustered index. Right now the table is considered a "heap". If you can't do this then I would also consider creating a view with a clustered index but first you'd have to change the "AmountPaid" column to NOT NULL. It already defaults to zero so this might be an easy change. For the view I'd try something similar to this.
SET QUOTED_IDENTIFIER, ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER ON
GO
SET NUMERIC_ROUNDABORT OFF
GO
IF EXISTS
(
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.VIEWS
WHERE TABLE_NAME = N'CustomerInvoiceSummary'
)
DROP VIEW dbo.CustomerInvoiceSummary
GO
CREATE VIEW dbo.CustomerInvoiceSummary WITH SCHEMABINDING
AS
SELECT a.CustomerID
, Sum(a.InvoiceTotal) AS SumOfInvoiceTotal
, Sum(a.AmountPaid) AS SumOfAmountPaid
, COUNT_BIG(*) AS CT
FROM dbo.tblInvoices a
GROUP BY a.CustomerID
GO
CREATE UNIQUE CLUSTERED INDEX CustomerInvoiceSummary_CLI ON dbo.CustomerInvoiceSummary ( CustomerID )
GO
Upvotes: 2
Reputation: 4014
Have you tried
exec sp_recompile tblInvoices
...just to make sure you're not using a cached bad plan?
Upvotes: 1
Reputation: 2993
Have you tried adding the other columns to your index? i.e. InvoiceTotal and AmountPaid.
The idea being that the query will be "covered" by the index, and won't have to refer back to the table.
Upvotes: 2
Reputation: 300529
The most common reasons for indexes to be ignored are
Columns involved are not selective enough (optimiser decides tables scans will be faster, due to 'visiting' a large amount of rows)
There are a large number of columns involved in SELECT/GROUP BY/ORDER BY and would involve a lookup into the clustered index after using the index
Statistics being out of date (or skewed by a large number of inserts or deletes)
Do you have a regular index maintenance job running? (it is quite common for it to be missing in Dev environment).
Upvotes: 3
Reputation: 18282
Several others have pointed out that your database may need the index statistics updated. You may also have such a high percentage of rows in the database that it would be faster to sequentially read the table than to seek across the disk to find every one. SQL Server has a fancy GUI query analyzer that will tell you what the database thinks the cost of various activiites is. You can open that up and see exactly what it was thinking.
We can give you more solid answers if you can give us:
Select * from tblinvoices;
Select * from tblinvoices where CustomerID = 2112;
Use that query analyzer, and update your statistics. One last hint: you can use index hints to force it to use your index if you're sure it's just being stupid after you've done everything else.
Upvotes: 1
Reputation: 294227
Latest post from Kimberly covers exactly this topic: http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Tipping-Point-Query-Answers.aspx
SQL Server uses a cost based optimizer and if the optimizer calculates that the cost of looking up the index keys and then look up the clustered index to retrieve the rest of the columns is higher than the cost of scanning the table, then it will scan the table instead. The 'tipping' point is actually surprisingly low.
Upvotes: 2
Reputation: 432210
Are you using "SELECT * FROM ..."? This generally results in scans.
We'd need schema, indexes and sample queries to help more
Upvotes: 0
Reputation: 171371
Try updating your statistics. These statistics are the basis for the decisions made by the compiler about whether it should use an index or not. They contain information such as cardinality and number of rows for each table.
E.g., if the statistics have not been updated since you did a large bulk import, the compiler may still think the table has only 10 rows in it, and not bother with an index.
Upvotes: 0
Reputation: 70598
You could also try doing an UPDATE STATISTICS on the table (or tables) involved in the query. Not that I fully understand statistics in SQL, but I do know it is something our DBAs do occasionally (with an weekly job being scheduled to update stats on the larger and frequently changed tables).
Upvotes: 0