Reputation: 275
I want to find classical Greatest N per Group. I have two methods to solve this problem
Dense_rank Over()
method Min Over()
methodBoth work flawlessly. Now I want to find which one is better and why.
Sample data:
CREATE TABLE #test
(
id INT,
NAME VARCHAR(50),
dates DATETIME
)
;WITH cte
AS (SELECT TOP (100000 ) n = ( ( Row_number()OVER (ORDER BY a.number) - 1 ) / 3 ) + 1
FROM [master]..spt_values a
CROSS JOIN [master]..spt_values b)
INSERT INTO #test
SELECT n,
Getdate() + n
FROM cte
Dense_rank Over() Method:
;WITH cte
AS (SELECT Dense_rank()OVER(partition BY NAME ORDER BY dates) AS rn,*
FROM #test)
SELECT id,
NAME,
dates
FROM cte
WHERE rn = 1;
Min Over() Method:
WITH cte
AS (SELECT Min(dates)OVER(partition BY NAME) AS max_date,*
FROM #test)
SELECT id,
NAME,
dates
FROM cte
WHERE max_date = dates
To compare the performance I checked on execution plan which said both query cost is 50%. But Max Over
method execution plan looks little complex. So any guidance here will be helpful. I don't have very good understanding in execution plan's.
Execution Plan:(6 Records)
Execution plan:(100000 records)
For 100000
records the execution plan says
Dense_rank Over() Query cost : 46%
Min Over() Query cost : 54%
Upvotes: 1
Views: 1216
Reputation: 2519
When you are doing this kind of analysis be sure to enable statistics, this data goes hand and hand with the execution plan and will bring more meaningful data to light.
set statistics io on
and set statistics time on
(just don't forget to turn them off. )
You will want to look at logical reads and scans, typically the less reads the more performing the query however, this doesn't mean it always will if the query doesn't scale.
I ran these with the stats I mentioned and Dense_Rank is clearly doing way less work and is over 100 ms faster.
Dense_Rank:
CPU time = 156 ms, elapsed time = 211 ms.
Table 'test'. Scan count 13, logical reads 359
Min:
CPU time = 389 ms, elapsed time = 317 ms
Table 'test'. Scan count 13, logical reads 359
Table 'Worktable'. Scan count 25, logical reads 203028
Upvotes: 4
Reputation: 1271231
Here are some notes, which are too long for comments:
min()
but asking about the max()
. (The two should be equivalent in terms of performance.)NULL
values in a dates
, then the results could even be different (max() over . . .
could return no rows, where dense_rank()
will always return at least one row).#test(name, dates)
, if the optimizer thinks this is desirable.So, if you want a real answer to this question, generate a bunch of data (say, at least 100,000 rows) and look at the execution plans on that data.
Upvotes: 4