Reputation: 9616
I'm having some serious performance issues with a multi-million row table that I feel I should be able to get results from fairly quick. Here's a run down of what I have, how I'm querying it, and how long it's taking:
I'm running SQL Server 2008 Standard, so Partitioning isn't currently an option
I'm attempting to aggregate all views for all inventory for a specific account over the last 30 days.
All views are stored in the following table:
CREATE TABLE [dbo].[LogInvSearches_Daily]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [Inv_ID] [int] NOT NULL, [Site_ID] [int] NOT NULL, [LogCount] [int] NOT NULL, [LogDay] [smalldatetime] NOT NULL, CONSTRAINT [PK_LogInvSearches_Daily] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] ) ON [PRIMARY]
This table has 132,000,000 records, and is over 4 gigs.
A sample of 10 rows from the table:
ID Inv_ID Site_ID LogCount LogDay -------------------- ----------- ----------- ----------- ----------------------- 1 486752 48 14 2009-07-21 00:00:00 2 119314 51 16 2009-07-21 00:00:00 3 313678 48 25 2009-07-21 00:00:00 4 298863 0 1 2009-07-21 00:00:00 5 119996 0 2 2009-07-21 00:00:00 6 463777 534 7 2009-07-21 00:00:00 7 339976 503 2 2009-07-21 00:00:00 8 333501 570 4 2009-07-21 00:00:00 9 453955 0 12 2009-07-21 00:00:00 10 443291 0 4 2009-07-21 00:00:00 (10 row(s) affected)
/****** Object: Index [IX_LogInvSearches_Daily_LogDay] Script Date: 05/12/2010 11:08:22 ******/ CREATE NONCLUSTERED INDEX [IX_LogInvSearches_Daily_LogDay] ON [dbo].[LogInvSearches_Daily] ( [LogDay] ASC ) INCLUDE ( [Inv_ID], [LogCount]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
I'm using the following query to aggregate the data and give me the top 5 records. This query is currently taking 24 seconds to return the 5 rows:
StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SELECT TOP 5 Sum(LogCount) AS Views , DENSE_RANK() OVER(ORDER BY Sum(LogCount) DESC, Inv_ID DESC) AS Rank , Inv_ID FROM LogInvSearches_Daily D (NOLOCK) WHERE LogDay > DateAdd(d, -30, getdate()) AND EXISTS( SELECT NULL FROM propertyControlCenter.dbo.Inventory (NOLOCK) WHERE Acct_ID = 18731 AND Inv_ID = D.Inv_ID ) GROUP BY Inv_ID (1 row(s) affected) StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |--Top(TOP EXPRESSION:((5))) |--Sequence Project(DEFINE:([Expr1007]=dense_rank)) |--Segment |--Segment |--Sort(ORDER BY:([Expr1006] DESC, [D].[Inv_ID] DESC)) |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1006]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) |--Sort(ORDER BY:([D].[Inv_ID] ASC)) |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1011], [Expr1012], [Expr1010])) | |--Compute Scalar(DEFINE:(([Expr1011],[Expr1012],[Expr1010])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) | | |--Constant Scan | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1011] AND [D].[LogDay] < [Expr1012]) ORDERED FORWARD) |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID]), SEEK:([propertyControlCenter].[dbo].[Inventory].[Acct_ID]=(18731) AND [propertyControlCenter].[dbo].[Inventory].[Inv_ID]=[LOA (13 row(s) affected)
I tried using a CTE to pick up the rows first and aggregate them, but that didn't run any faster, and gives me essentially the same execution plan.
(1 row(s) affected) StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --SET SHOWPLAN_TEXT ON; WITH getSearches AS ( SELECT LogCount -- , DENSE_RANK() OVER(ORDER BY Sum(LogCount) DESC, Inv_ID DESC) AS Rank , D.Inv_ID FROM LogInvSearches_Daily D (NOLOCK) INNER JOIN propertyControlCenter.dbo.Inventory I (NOLOCK) ON Acct_ID = 18731 AND I.Inv_ID = D.Inv_ID WHERE LogDay > DateAdd(d, -30, getdate()) -- GROUP BY Inv_ID ) SELECT Sum(LogCount) AS Views, Inv_ID FROM getSearches GROUP BY Inv_ID (1 row(s) affected) StmtText ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1004]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) |--Sort(ORDER BY:([D].[Inv_ID] ASC)) |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1008], [Expr1009], [Expr1007])) | |--Compute Scalar(DEFINE:(([Expr1008],[Expr1009],[Expr1007])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) | | |--Constant Scan | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1008] AND [D].[LogDay] < [Expr1009]) ORDERED FORWARD) |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID] AS [I]), SEEK:([I].[Acct_ID]=(18731) AND [I].[Inv_ID]=[LOALogs].[dbo].[LogInvSearches_Daily].[Inv_ID] as [D].[Inv_ID]) ORDERED FORWARD) (8 row(s) affected) (1 row(s) affected)
So given that I'm getting good Index Seeks in my execution plan, what can I do to get this running faster?
UPDATE:
Here's the same query run without the DENSE_RANK() and it takes the exact same 24 seconds to run, and gives me the same base query plan:
StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --SET SHOWPLAN_TEXT ON SELECT TOP 5 Sum(LogCount) AS Views , Inv_ID FROM LogInvSearches_Daily D (NOLOCK) WHERE LogDay > DateAdd(d, -30, getdate()) AND EXISTS( SELECT NULL FROM propertyControlCenter.dbo.Inventory (NOLOCK) WHERE Acct_ID = 18731 AND Inv_ID = D.Inv_ID ) GROUP BY Inv_ID ORDER BY Views, Inv_ID (1 row(s) affected) StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |--Sort(TOP 5, ORDER BY:([Expr1006] ASC, [D].[Inv_ID] ASC)) |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1006]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) |--Sort(ORDER BY:([D].[Inv_ID] ASC)) |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1010], [Expr1011], [Expr1009])) | |--Compute Scalar(DEFINE:(([Expr1010],[Expr1011],[Expr1009])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) | | |--Constant Scan | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1010] AND [D].[LogDay] < [Expr1011]) ORDERED FORWARD) |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID]), SEEK:([propertyControlCenter].[dbo].[Inventory].[Acct_ID]=(18731) AND [propertyControlCenter].[dbo].[Inventory].[Inv_ID]=[LOALogs].[dbo].[LogInvS (9 row(s) affected)
Thanks,
Dan
Upvotes: 2
Views: 9687
Reputation: 34177
I've not yet read through your whole question (I'll come to that shortly) but to answer an early comment: you can use partitioned views in SQL Server 2008 standard edition. It's partitioned tables (which are admittedly more flexible) that are restricted to Enterprise edition.
Paritioned views info: http://msdn.microsoft.com/en-us/library/ms190019.aspx
On the wider question I'd like to know if you really need the DENSE_RANK in there. I'm wondering if you're confused between the ORDER BY inside the DENSE_RANK and the ORDER BY of the query itself. As it stands your TOP 5 will return 5 undefined records since SQL Server does not guarantee any order on records unless an ORDER BY clause is specified (which you've not done). If you move the ORDER BY from the DENSE_RANK down to be the whole query ORDER BY as follows, the records will come out as I think you want and it will remove the need for the expensive DENSE_RANK aggregate function.
SELECT TOP 5
SUM([LogCount]) AS [Views],
[Inv_ID]
FROM [LogInvSearches_Daily] D (NOLOCK)
WHERE
[LogDay] > DateAdd(d, -30, getdate())
AND EXISTS(
SELECT *
FROM Inventory (NOLOCK)
WHERE Acct_ID = 18731
AND Inv_ID = D.Inv_ID
)
GROUP BY
Inv_ID
ORDER BY
[Views] DESC,
[Inv_ID]
UPDATE:
The time is probably being used up here:
|--Sort(ORDER BY:([D].[Inv_ID] ASC))
You could try creating a covering index like this one:
CREATE NONCLUSTERED INDEX [IX_LogInvSearches_Daily_Perf] ON [dbo].[LogInvSearches_Daily]
(
[Inv_ID] ASC,
[LogDay] ASC
)
INCLUDE
(
[LogCount]
)
Note that I've also altered the ORDER BY slightly (Inv_ID is now sorted ASC instead of DESC). I suspect this change won't affect the results in a problematic way but may help performance since it will be returning rows in the same order that they are grouped (although ths may be irrelevant!).
Upvotes: 1
Reputation: 2238
Acct_ID is on the Inventory table, and seems to have an index to itself (IX_Inventory_Acct_ID). Perhaps if Inventory had an index on (Acct_Id, Inv_Id) and LogInvSearches_Daily was clustered (or at least indexed) around (Inv_Id, LogDay), you'd have more luck.
BTW, I've no idea what the current clustering index on LogInvSearches_Daily.ID is supposed to be buying you. Why is it importing to have records with close IDs close on the disk?
Upvotes: 0
Reputation: 432230
Partitioning aside,
Based on our experience with larger table than yours, we extract data into a temp table (not table variable) and aggregate on that. Not for all queries, but the more complex ones.
Other than that, I agree with Daniel Renshaw's observations about DENSE_RANK
I'd also think about moving [Inv_ID], [LogCount] into the index (not include, perhaps with a DESC sort)
Upvotes: 1