sam
sam

Reputation: 149

Calculating the 95th Percentile value, but not necessarily from the dataset

I am trying to figure out how to calculate the 95th percentile value off of a given dataset. I can use the T-SQL below to get the 95th percentile value from the data-set, but the code below always gives me the value from the data-set.

SELECT MIN(Value) 
FROM
(
    SELECT TOP 95 PERCENT WITH TIES Value
    FROM [Sample].[dbo].[numbers]
    ORDER BY Value DESC
) AS Temp

For example, for the data-set 23, 5, 11, 22, 25, 14, the 95th percentile value is 24.5 per Excel, but SQL Server is not going to return this value as it is not in the data-set.

Can anyone please advise if they have any suggestions?

Thanks in advance for helping out.

Upvotes: 2

Views: 3741

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269503

It looks like you want an "interpolated" median. Yucch. But, you can do this with SQL. Just be explicit in the calculation:

select (max(case when rownum <= 0.95 * total then value end) +
        min(case when rownum >= 0.95 * total then value end)
       ) / 2.0 as Interpolated_95th
from (SELECT n.*, row_number() over (order by value) as rownum,
             count(*) over () as total
      FROM [Sample].[dbo].[numbers] n
     ) t

Note that this works in the case where there are exactly a multiple of 20 rows (so the 95th percentile would be at 19x position), because of the equality -- min() and max() would be the same. If also works in other cases, always returning the point half-way in-between the two values.

Note: I don't really think this is any more valid than the value that SQL returns. By the definition of median and percentile, any value between the two is equally valid, and there is a good argument for using either extreme. That said, I can greatly appreciate that in the real world, you might have to do this.

Upvotes: 1

Related Questions