O.O
O.O

Reputation: 11317

Calculate Percentile Rank using NTILE?

Need to calculate the percentile rank (1st - 99th percentile) for each student with a score for a single test.

I'm a little confused by the msdn definition of NTILE, because it does not explicitly mention percentile rank. I need some sort of assurance that NTILE is the correct keyword to use for calculating percentile rank.

declare @temp table
(
  StudentId int,
  Score int
)
insert into @temp
select 1, 20
union
select 2, 25
.....

select NTILE(100) OVER (order by Score) PercentileRank
from @temp

It looks correct to me, but is this the correct way to calculate percentile rank?

Upvotes: 6

Views: 24459

Answers (5)

David Storfer
David Storfer

Reputation: 350

NTILE is absolutely NOT the same as percentile rank. NTILE simply divides up a set of data evenly by the number provided (as noted by RoyiNamir above). If you chart the results of both functions, NTILE will be a perfectly linear line from 1-to-n, whereas percentile rank will [usually] have some curves to it depending on your data.

Percentile rank is much more complicated than simply dividing it up by N. It then takes each row's number and figures out where in the distribution it lies, interpolating when necessary (which is very CPU intensive). I have an Excel sheet of 525,000 rows and it dominates my 8-core machine's CPU at 100% for 15-20 minutes just to figure out the PERCENTRANK function for a single column.

Upvotes: 9

Alan Burstein
Alan Burstein

Reputation: 7928

I know this is an old thread but there's certainly a lot of misinformation about this topic making it's way around the internet.

NTILE is not designed for calculating percentile rank (AKA percent rank)

If you are using NTILE to calculate Percent Rank you are doing it wrong. Anyone who tells you otherwise is misinformed and mistaken. If you are using NTILE(100) and getting the correct answer its purely coincidental.

Tim Lehner explained the problem perfectly.

"It will assign a different percentile to two instances of the same value if those instances happen to straddle a crossover point."

In other words, using NTILE to calculate where students rank based on their test scores can result in two students with the exact same test scores receiving different percent rank values. Conversely, two students with different scores can receive the same percent rank.

For a more verbose explanation of why NTILE is the wrong tool for this job as well as well as a profoundly better performing alternative to percent_rank see: Nasty Fast PERCENT_RANK. http://www.sqlservercentral.com/articles/PERCENT_RANK/141532/

Upvotes: 0

Tim Lehner
Tim Lehner

Reputation: 15251

One way to think of this is, "the percentage of Students with Scores below this one."

Here is one way to get that type of percentile in SQL Server, using RANK():

select *
    , (rank() over (order by Score) - 1.0) / (select count(*) from @temp) * 100 as PercentileRank
from @temp

Note that this will always be less than 100% unless you round up, and you will always get 0% for the lowest value(s). This does not necessarily put the median value at 50%, nor will it interpolate like some percentile calculations do.

Feel free to round or cast the whole expression (e.g. cast(... as decimal(4,2))) for good looking reports, or even replace - 1.0 with - 1e to force floating point calculation.

NTILE() isn't really what you're looking for in this case because it essentially divides the row numbers of an ordered set into groups rather than the values. It will assign a different percentile to two instances of the same value if those instances happen to straddle a crossover point. You'd have to then additionally group by that value and grab the max or min percentile of the group to use NTILE() in the same way as we're doing with RANK().

Upvotes: 2

Tony Thampan
Tony Thampan

Reputation: 21

There is an issue with your code as NTILE distribution is not uniform. If you have 213 students, the top most 13 groups would have 3 students and the latter 87 would have 2 students each. This is not what you would ideally want in a percentile distribution.

You might want to use RANK/ROWNUM and then divide to get the %ile group.

Upvotes: 1

Andrey Gurinov
Andrey Gurinov

Reputation: 2895

Is there a typo?

select NTILE(100) OVER (order by Score) PercentileRank 
from @temp

And your script looks good. If you think something wrong there, could you clarify what excactly?

Upvotes: 1

Related Questions