Aaj
Aaj

Reputation: 259

SQL Percercentile Calculation

I have the following query, which even without a ton of data (~3k rows) is still a bit slow to execute, and the logic is a bit over my head - was hoping to get some help optimizing the query or even an alternate methodology:

Select companypartnumber, (PartTotal + IsNull(Cum_Lower_Ranks, 0) ) / Sum(PartTotal) over() * 100 as Cum_PC_Of_Total
FROM PartSalesRankings PSRMain
Left join 
    (
        Select PSRTop.Item_Rank, Sum(PSRBelow.PartTotal) as Cum_Lower_Ranks
          from partSalesRankings PSRTop
          Left join PartSalesRankings PSRBelow on PSRBelow.Item_Rank < PSRTop.Item_Rank
         Group by PSRTop.Item_Rank 
    ) as PSRLowerCums on PSRLowerCums.Item_Rank = PSRMain.Item_Rank

The PartSalesRankings table simply consists of CompanyPartNumber(bigint) which is a part number designation, PartTotal(decimal 38,5) which is the total sales, and Item_Rank(bigint) which is the rank of the item based on total sales.

I'm trying to end up with my parts into categories based on their percentile - so an "A" item would be top 5%, a "B" item would be the next 15%, and "C" items would be the lower 80th percentile. The view I created works fine, it just takes almost three seconds to execute, which for my purposes is quite slow. I narrowed the bottle neck to the above query - any help would be greatly appreciated.

Upvotes: 0

Views: 174

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269503

The problem you are having is the calculation of the cumulative sum of PartTotal. If you are using SQL Server 2012, you can do something like:

select (case when ratio <= 0.05 then 'A'
             when ratio <= 0.20 then 'B'
             else 'C'
        end),
       t.*
from (select psr.companypartnumber,
             (sum(PartTotal) over (order by PartTotal) * 1.0 / sum(PartTotal) over ()) as ratio
      FROM PartSalesRankings psr
     ) t

SQL Server 2012 also have percentile functions and other functions not in earlier versions.

In earlier versions, the question is how to get the cumulative sum efficiently. Your query is probably as good as anything that can be done in one query. Can the cumulative sum be calculated when partSalesRankings is created? Can you use temporary tables?

Upvotes: 1

Related Questions