Reputation: 148524
I've been summarizing for myself - the window functions in SQL server.
But it's also very important for me - no just to know how to write functions , but also - in what scenarios should I use them.
Well I can honestly say I don't have an answer for this for the Rank
function where it is perfectly clear in what scenarios I should use DENSE_RANK
:
Example : there are 5
competitors and 2
of them finished the end line at the same time - so both get same rank ( with DENSE_RANK)- both will get 1
. The other 3
competitors (that finished also the same time) would get rank 2
( etc...)
Question:
In what scenarios should I use the Rank
function , I know how numbers are generated but I don't see any scenario for using it
Visualization example :
DECLARE @t TABLE(NAME NVARCHAR(MAX),val money)
insert INTO @t SELECT 'a',100
insert INTO @t SELECT 'a',100
insert INTO @t SELECT 'a',100
insert INTO @t SELECT 'a',100
insert INTO @t SELECT 'b',200
insert INTO @t SELECT 'b',200
insert INTO @t SELECT 'd',400
insert INTO @t SELECT 'e',500
insert INTO @t SELECT 'f',600
select Name,
val,
ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY name),
val/ SUM(val) OVER(PARTITION BY NAME ) AS '1AgainstTotalHimself',
val/ SUM(val) OVER( ) AS '1AgainstOthers' ,
NTILE(2) OVER ( PARTITION BY NAME ORDER BY name) AS 'ntile2' ,
NTILE(2) OVER ( ORDER BY name) AS 'ntile' , -- ( 9%2=1 , so group #1 will get more number)
RANK( ) OVER ( ORDER BY name ) AS Rank,
DENSE_RANK( ) OVER (ORDER BY name) AS DENSERANK
from @t
Upvotes: 2
Views: 241
Reputation: 148524
After talking with Aaron Bertrand - I finally got the "human worded description" for when to use Rank :
dense_rank()
is when 4 people tied for first, the next-best is in second.rank()
is when 4 people tied for first, the next-best if fifth. There isn't really any technical reason you would use one over the other, it's how you want the numbers displayed and consumed. In some sports, for example, they want a nice clean 1st, 2nd, 3rd. In others, they want your rank to reflect(!!!) how many people finished ahead(!!!) of you (well, minus one, depending on your perspective).
So with Rank , I dont care about being at first place or second place in a 1,2,3 world" - It is more competitive : the number itself represents also the weight of being second.( for example)
so if 100 people win at 05:00 and i'm after them at 05:01 - then they all will be first place in DenseRank ( and i'll be at second place). but with Rank - they all be 1 and I'll be 101. so it's adding a "weight" of being second.
Also , now I see where the word dense comes from. it's removing the "weight" .
Another interpretation :
With dense_rank : looking at 2 , doesn't mean i'm almost the best. there could be 100 people who finished at 05:00
With rank : looking at 2 , means truly i'm almost the best.
Upvotes: 2
Reputation: 77677
RANK()
can be seen as an equivalent of TOP (n) WITH TIES
, except the latter will get you the specified rows over the entire set only while the former can also do so over partitions.
So, if you needed to select, say, top five results per group but include all rows, if any, that tied with the #5 row in each group, RANK()
would solve this for you:
WITH ranked AS (
SELECT
EmployeeID,
FirstName,
LastName,
DepartmentID,
Salary,
rnk = RANK() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC)
FROM
dbo.Employee
)
SELECT
EmployeeID,
FirstName,
LastName,
DepartmentID,
Salary
FROM
ranked
WHERE
rnk <= 5
;
Upvotes: 3