Reputation: 297
In some SQL query performance tuning, I noticed that the following query was running slowly but it wasn't thrashing the CPU and there appeared to be no other system bottlenecks to cause it to run slowly. In fact the CPU average was 15% whilst it ran:
UPDATE: The query in question runs in a cursor loop which contains 800 records:
cursor = SELCT DISTINCT param1,param2, param3, param4
FROM t_Table
GROUP BY param1,param2, param3, param4
ORDER BY param1,param2, param3, param4 DESC
cursor loop:
SELECT @maxval1 = max(iField1),
@maxval2 = max(iField2),
@dtDateMin = Min(dtDate),
@dtDateMax = Max(dtDate)
FROM t_Table
WHERE iSomeField1 = @param1
AND iSomeField2 = @param2
AND iSomeField3 = @param3
AND iSomeField4 = @param4
next
(Note: there are indices setup for iSomeField1-4)
I then broke out the individual min/max parts into four queries to see how the server responded and got full throughput with the CPU peaking at 100% and the block ran in 2 seconds as opposed to > 5mins for the above. Whilst I am indeed happy with the performance increases, I would like some clever DBA to explain why this is and what other tips they would could give on these types of queries?
SELECT TOP 1 @maxval1 = iField1
FROM t_Table
WHERE iSomeField1 = @param1
AND iSomeField2 = @param2
AND iSomeField3 = @param3
AND iSomeField4 = @param4
ORDER BY field1 DESC
SELECT TOP 1 @maxval2 = iField2
FROM t_Table
WHERE iSomeField1 = @param1
AND iSomeField2 = @param2
AND iSomeField3 = @param3
AND iSomeField4 = @param4
ORDER BY field2 DESC
SELECT TOP 1 @dtDateMin = dtDate
FROM t_Table
WHERE iSomeField1 = @param1
AND iSomeField2 = @param2
AND iSomeField3 = @param3
AND iSomeField4 = @param4
ORDER BY dtDate ASC
SELECT TOP 1 @dtDateMax = dtDate
FROM t_Table
WHERE iSomeField1 = @param1
AND iSomeField2 = @param2
AND iSomeField3 = @param3
AND iSomeField4 = @param4
ORDER BY dtDate DESC
Please note I'm a developer and not a DBA but would like to understand alittle more about how SQL Server works behind the scene on queries like the above. To add to some of the answers below, I am aware and did use the query execution planner to analyse the performance issue and what puzzles me is that despite indexes on the table, why the first query performs badly on the SQL server.
UPDATE: Screenshots of CPU usage of query 1 against query 2+ run in a cursor loop:
CPU usage for query 1 and query 2 http://img22.imageshack.us/img22/3262/sqlperfodd.png Query 1 EP http://img513.imageshack.us/img513/5681/query1.png Query 2 EP http://img365.imageshack.us/img365/9715/query2.png
The schema is shown below (this is a cut down version of our real table but the structure and indexs are what we use).
CREATE TABLE [dbo].[t_Table] (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
[dtDate] [datetime] NULL ,
[iField1] [int] NULL ,[iField2] [int] NULL ,
[iSomeField1] [int] NULL ,[iSomeField2] [int] NULL ,
[iSomeField3] [int] NULL ,[iSomeField4] [int] NULL) ON [PRIMARY]
CREATE CLUSTERED INDEX [IX_dtDate] ON [dbo].[t_Table]([dtDate], [iField1],
[iSomeField1]) WITH FILLFACTOR = 90 ON [PRIMARY]
ALTER TABLE [dbo].[t_Table] ADD
CONSTRAINT [PK_t_Table] PRIMARY KEY NONCLUSTERED
([ID]) ON [PRIMARY]
CREATE INDEX [idx_field1234] ON [dbo].[t_Table]([iSomeField1], [iSomeField2],
[iSomeField3], [iSomeField4]) ON [PRIMARY]
CREATE INDEX [idx_field1] ON [dbo].[t_Table]([iSomeField1]) [PRIMARY]
Upvotes: 1
Views: 311
Reputation: 76
It depends. What indexes are on the table? What volume of rows are within the table? I've just created a sample which performed well but it may be very different to your scenario. As a rule, if the optimizer has trouble then the query needs to be simplified. What you have done maybe what is required. It depends. Here's the SQL I knocked up to see if I could find anything obvious with the show execution plan on.
set nocount on
GO
if object_id('tempdb..#MaxMinExample')is not null drop table #MaxMinExample
GO
create table #MaxMinExample([key] int identity(1,1) primary key clustered,iField1 int,iField2 int,dtDate datetime,iSomeField1 int,iSomeField2 int,iSomeField3 int,iSomeField4 int)
GO
--initial data set which we'll cartesian
insert into #MaxMinExample(iField1 ,iField2 ,dtDate ,iSomeField1 ,iSomeField2 ,iSomeField3 ,iSomeField4 )
values (1,2,getdate(),1,2,3,4)
insert into #MaxMinExample(iField1 ,iField2 ,dtDate ,iSomeField1 ,iSomeField2 ,iSomeField3 ,iSomeField4 )
values (2,3,getdate()+1,4,5,6,7)
insert into #MaxMinExample(iField1 ,iField2 ,dtDate ,iSomeField1 ,iSomeField2 ,iSomeField3 ,iSomeField4 )
values (3,4,getdate()+2,5,6,7,8)
insert into #MaxMinExample(iField1 ,iField2 ,dtDate ,iSomeField1 ,iSomeField2 ,iSomeField3 ,iSomeField4 )
values (5,6,getdate()+3,6,7,8,9)
insert into #MaxMinExample(iField1 ,iField2 ,dtDate ,iSomeField1 ,iSomeField2 ,iSomeField3 ,iSomeField4 )
values (6,7,getdate()+4,7,8,9,10)
GO
--create loads of data
declare @count int set @count=1
while (select count(*) from #MaxMinExample)<865830
begin
insert into #MaxMinExample(iField1 ,iField2 ,dtDate ,iSomeField1 ,iSomeField2 ,iSomeField3 ,iSomeField4 )
select a.iField1+@count ,a.iField2+@count ,a.dtDate+@count ,a.iSomeField1+@count ,a.iSomeField2+@count ,a.iSomeField3+@count ,a.iSomeField4+@count from #MaxMinExample a cross join #MaxMinExample b
set @count=@count+1
end
GO
--create the indexes
create index MaxMinExample_iSomeField1 on #MaxMinExample(iSomeField1)
create index MaxMinExample_iSomeField2 on #MaxMinExample(iSomeField2)
create index MaxMinExample_iSomeField3 on #MaxMinExample(iSomeField3)
create index MaxMinExample_iSomeField4 on #MaxMinExample(iSomeField4)
create index MaxMinExample_dtDate on #MaxMinExample(dtDate)
GO
declare @maxval1 int,@maxval2 int,@dtDateMin datetime,@dtDateMax datetime,@param1 int,@param2 int,@param3 int,@param4 int
select @param1=4,@param2=5,@param3=6,@param4=7
select @maxval1 = max(iField1), @maxval2 = max(iField2), @dtDateMin = Min(dtDate),@dtDateMax = Max(dtDate) from #MaxMinExample
where iSomeField1=@param1 and iSomeField2=@param2 and iSomeField3=@param3 and iSomeField4=@param4
select top 1 @maxval1 = iField1 from #MaxMinExample where iSomeField1=@param1 and iSomeField2=@param2 and iSomeField3=@param3 and iSomeField4=@param4 order by iField1 DESC
select top 1 @maxval2 = iField2 from #MaxMinExample where iSomeField1=@param1 and iSomeField2=@param2 and iSomeField3=@param3 and iSomeField4=@param4 order by iField2 DESC
select top 1 @dtDateMin = dtDate from #MaxMinExample where iSomeField1=@param1 and iSomeField2=@param2 and iSomeField3=@param3 and iSomeField4=@param4 order by dtDate ASC
select top 1 @dtDateMax = dtDate from #MaxMinExample where iSomeField1=@param1 and iSomeField2=@param2 and iSomeField3=@param3 and iSomeField4=@param4 order by dtDate DESC
Upvotes: 1
Reputation: 3517
I have had a look at your query and run some quick tests through it. As Diego mentioned, the execution plan in Query Analyzer is a great tool for troubleshooting this kind of thing.
First of all a note on indexes - when SQL recieves a query it will run through a series of steps - to simplify this process one of the main tasks here is to decide what is the most effecient way to retrieve the data. SQL will look at the table structure, statistics and indexes to determine what it believes will be the optimal path. It will ultimately choose one index to use to retrieve data (never multiple indexes).
The index is basically a lookup to where data is stored on the table (or in the case of the clustered index it actually defines HOW the data is stored on the disk). If your query is using a number of columns (either in the query, where or order clauses) then the query will need to retrieve these so that it can return your query. SQL will look at the index and then look at the order it needs to retrieve indexes in. Ideally SQL will be able to "seek" directly to the data you have requested (the most common alternative is a "scan", which basically means the entire index / structure has been scanned for the data). These terms are used in the above execution plan. There is a lot of additional complexity in the above (for example in some queries you may see bookmark lookups, where SQL uses an index to find the row and then perform a lookup to get any associated data) but that should be a good starting step.
As an aside here I will mention a site that i think is great for SQL performance info - www.sql-server-performance.com
Taking the above and applying to your information we can see that your main query is performing a clustered index "scan" - i.e. it searches the entire table to retrieve the information you requested. This means that no index was found that would allow it to directly seek to the information you required. To execute this query SQL would need to perform a filter on your fields 1 - 4, and then aggregate information (the MAX query) on the field and date columns.
As an extreme example, this index would allow a seek:
CREATE INDEX [idx_field1234567] ON [dbo].[t_Table]( [iSomeField1], [iSomeField2], [iSomeField3], [iSomeField4], [dtDate], [iField1], [iField2] ) WITH FILLFACTOR = 90, PAD_INDEX ON [PRIMARY]
However you should note a lot goes into creating an index, and generally you should avoid adding too many columns to an index and you should avoid too many indexes on a table. In SQL 2000 the index tuning wizard is a good deal to provide a decent baseline of indexes.
As Diego said, the above may seem daunting but the site above was a great reference for me.
Good luck!
Upvotes: 1
Reputation: 297
In the end, the project moved to a new DWH and BI stack based on MicroStrategy so this became a none issue and the new DWH has a different schema and was done in SQL 2008 (but I never got to the bottom it :/)
Upvotes: 0
Reputation:
Unfortunately the information you provide is not enough to give you a precise answer, but I think I can give you an useful hint. SQL Server allows you to view the query plan it uses uses to access the data; such plan will tell you in detail what is accessed, when, in which way, and how many rows are processed in each step. It also tells you how much time/resource consuming is each step, allowing you to find bottleneck easily.
To show the execution plan in Query Analyzer, open the Query Menu and click on "Show Execution Plan". Then run your first query and check the plan; in another window, run the second query and check the plan again. This way you can see what's the difference between them, which indexes (if any) are used and get a better understanding of SQL Server.
One hint: don't be discouraged if at the beginning everything seems complicated, it's just a matter of taking it slowly.
Finally, an useful resource for SQL Server (other than the MSDN, of course) is www.sqlservercentral.com, where you can find answers from users and experts. I hope this helps.
Upvotes: 0