Reputation: 4346
I'm running a query over a table variable that holds 22 227 rows. The query used to take 2-3 seconds to complete (which I still think is too slow) but since I added another field to the ORDER BY
clause in DENSE_RANK()
it now completes in 4.5 minutes!
If I include [t2].[aisdt]
with or without [t2].[aiID]
, the execution plan shows that it's scanning 494 039 529 rows, which is 22 227 squared. The following query generates the correct results, just much too slowly to be useful.
SELECT MAX([t].[SetNum]) OVER (PARTITION BY NULL) AS [MaxSet]
,*
FROM (
SELECT DENSE_RANK() OVER (ORDER BY [t2].[aisdt], [t2].[aiID]) AS [SetNum]
,[t2].*
FROM (
SELECT [aiID]
,COUNT(DISTINCT [acID]) AS [noac]
FROM @Temp
GROUP BY [aiID]
) [t1]
JOIN @Temp [t2]
ON [t2].[aiID] = [t1].[aiID]
WHERE [t1].[noac] < [t2].[asm]
) [t]
Just to be clear, the culprit is the bold section in "DENSE_RANK() OVER (ORDER BY [t2].[aisdt], [t2].[aiID])". Removing this field (which needs to remain) drops the execution time back down to 2-3 seconds. I think it might have something to do with JOIN
ing the table to itself on [aiID]
but not [aisdt]
.
How can I speed this query up to complete in the same time as before, or less?
Table definition:
DECLARE @Temp TABLE (
[aiID] INT NOT NULL INDEX [IX_Temp_aiID] -- not unique
,[aisdt] DATETIME NOT NULL INDEX [IX_Temp_aisdt] -- not unique
,[asm] INT NOT NULL
,[cpcID] INT NULL
,[cpce] VARCHAR(10) NULL
,[acID] INT NULL
,[ctvID] INT NULL
,[ct] VARCHAR(100) NULL
,[_36_other_non_matched_fields_] VARCHAR(MAX)
,UNIQUE ([aiID], [cpcID], [cpce], [acID], [ctvID], [ct])
)
[aisdt]
is unique per [aiID]
, but there can be multiple [aiID]
s with the same [aisdt]
.
INSERT INTO @TEMP
VALUES (64, '2017-03-23 10:00:00', 1, 17, '', NULL, NULL, NULL, 'blah')
,(64, '2017-03-23 10:00:00', 1, 34, '', NULL, NULL, NULL, 'blah')
,(99, '2017-04-08 09:00:00', 1, 25, 'Y', NULL, NULL, NULL, 'blah')
,(99, '2017-04-08 09:00:00', 1, 16, 'Y', NULL, NULL, NULL, 'blah')
,(99, '2017-04-08 09:00:00', 1, 76, 'Y', NULL, NULL, NULL, 'blah')
,(99, '2017-04-08 09:00:00', 1, 82, 'Y', NULL, NULL, NULL, 'blah')
,(42, '2017-04-14 16:00:00', 2, 32, '', 32, NULL, NULL, 'blah')
,(42, '2017-04-14 16:00:00', 2, 32, '', 47, NULL, NULL, 'blah')
,(42, '2017-04-14 16:00:00', 2, 47, '', 32, NULL, NULL, 'blah')
,(42, '2017-04-14 16:00:00', 2, 47, '', 47, NULL, NULL, 'blah')
,(54, '2017-03-23 10:00:00', 1, 17, '', NULL, NULL, NULL, 'blah')
,(54, '2017-03-23 10:00:00', 1, 34, '', NULL, NULL, NULL, 'blah')
,(89, '2017-04-08 09:00:00', 1, 25, 'Y', NULL, NULL, NULL, 'blah')
,(89, '2017-04-08 09:00:00', 1, 16, 'Y', NULL, NULL, NULL, 'blah')
,(89, '2017-04-08 09:00:00', 1, 76, 'Y', NULL, NULL, NULL, 'blah')
,(89, '2017-04-08 09:00:00', 1, 82, 'Y', NULL, NULL, NULL, 'blah')
,(32, '2017-04-14 16:00:00', 3, 32, '', 32, NULL, NULL, 'blah')
,(32, '2017-04-14 16:00:00', 3, 32, '', 47, NULL, NULL, 'blah')
,(32, '2017-04-14 16:00:00', 3, 47, '', 32, NULL, NULL, 'blah')
,(32, '2017-04-14 16:00:00', 3, 47, '', 47, NULL, NULL, 'blah')
It must be sorted by [aisdt]
(datetime) first, then [aiID]
, then numbered into sets based on [aiID]
.
I want to see:
5, 1, 54, '2017-03-23 10:00:00', 1, 17, '', NULL, NULL, NULL, 'blah'
5, 1, 54, '2017-03-23 10:00:00', 1, 34, '', NULL, NULL, NULL, 'blah'
5, 2, 64, '2017-03-23 10:00:00', 1, 17, '', NULL, NULL, NULL, 'blah'
5, 2, 64, '2017-03-23 10:00:00', 1, 34, '', NULL, NULL, NULL, 'blah'
5, 3, 89, '2017-04-08 09:00:00', 1, 25, 'Y', NULL, NULL, NULL, 'blah'
5, 3, 89, '2017-04-08 09:00:00', 1, 16, 'Y', NULL, NULL, NULL, 'blah'
5, 3, 89, '2017-04-08 09:00:00', 1, 76, 'Y', NULL, NULL, NULL, 'blah'
5, 3, 89, '2017-04-08 09:00:00', 1, 82, 'Y', NULL, NULL, NULL, 'blah'
5, 4, 99, '2017-04-08 09:00:00', 1, 25, 'Y', NULL, NULL, NULL, 'blah'
5, 4, 99, '2017-04-08 09:00:00', 1, 16, 'Y', NULL, NULL, NULL, 'blah'
5, 4, 99, '2017-04-08 09:00:00', 1, 76, 'Y', NULL, NULL, NULL, 'blah'
5, 4, 99, '2017-04-08 09:00:00', 1, 82, 'Y', NULL, NULL, NULL, 'blah'
5, 5, 32, '2017-04-14 16:00:00', 3, 32, '', 32, NULL, NULL, 'blah'
5, 5, 32, '2017-04-14 16:00:00', 3, 32, '', 47, NULL, NULL, 'blah'
5, 5, 32, '2017-04-14 16:00:00', 3, 47, '', 32, NULL, NULL, 'blah'
5, 5, 32, '2017-04-14 16:00:00', 3, 47, '', 47, NULL, NULL, 'blah'
Upvotes: 2
Views: 133
Reputation: 32693
The main idea is taken from Partition Function COUNT() OVER possible using DISTINCT that @Jayvee pointed out with a small addition that would make it work when acID
has NULL
values.
Most likely you can remove all indexes from your @Temp
table, the server will have to sort it in several different ways for different window functions anyway, but there is no self-join, so it should be faster.
The plan will have many sorts and they also can be slow, especially when engine underestimates the number of rows in a table. And table variable is exactly this case. Optimiser thinks that table variable has only 1 row. So, I'd recommend to use a classic #Temp
table here, even without indexes.
An index on (aiID, acID)
should help, but there will be other sorts any way.
WITH
CTE_Counts
AS
(
SELECT
*
-- use DENSE_RANK() to calculate COUNT(DISTINCT)
, DENSE_RANK() OVER (PARTITION BY [aiID] ORDER BY [acID])
+ DENSE_RANK() OVER (PARTITION BY [aiID] ORDER BY [acID] DESC)
-- subtract extra 1 if acID has NULL values within the partition
- MAX(CASE WHEN [acID] IS NULL THEN 1 ELSE 0 END) OVER (PARTITION BY [aiID])
- 1 AS [noac]
FROM @Temp
)
,CTE_SetNum
AS
(
SELECT
*
, DENSE_RANK() OVER (ORDER BY [aisdt], [aiID]) AS [SetNum]
FROM CTE_Counts
WHERE [noac] < [asm]
)
SELECT
*
, MAX([SetNum]) OVER () AS [MaxSet]
FROM CTE_SetNum
ORDER BY
[aisdt]
,[aiID]
,[SetNum]
;
Upvotes: 2
Reputation: 10875
Index as suggested in the comments would definitely play a major part but also I think you can re-write the query without self join in this way:
SELECT MAX([t].[SetNum]) OVER (PARTITION BY NULL) AS [MaxSet]
,*
FROM (
SELECT DENSE_RANK() OVER (ORDER BY [t1].[aisdt], [t1].[aiID]) AS [SetNum]
,[t1].*
FROM (
SELECT * ,dense_rank() over(partition by aiID order by [acID]) -
dense_rank() over(partition by aiID order by [acID]) - 1 AS [noac]
FROM @Temp
) [t1]
WHERE [t1].[noac] < [t1].[asm]
) [t]
Upvotes: 1