m7d
m7d

Reputation: 736

Select top n percent by year

I have a query that I built that brings back net sales YTD up to the most recently completed month. The query unions totals from invoices and credit memos. It works great. I use the query in another tool that sums by cardcode and allows me to do interesting pivots and so forth. Here is that query:

select x.cardcode, 
       x.customer,
       case 
         when x.rep is null then (select slpname from ocrd inner join oslp on ocrd.slpcode = oslp.slpcode where ocrd.cardcode = x.cardcode)
         else
           x.rep
       end as rep, 
       x.city, 
       x.state, 
       x.country,
       case
         when isnumeric(x.total) = 0 then 0
         else x.total
       end as [net total],
       x.docdate

from (
  select t0.cardcode as cardcode, 
         t0.[cardname] as customer, 
         t1.city as city,
         t1.state as state,
         t1.country as country, 
         t4.slpname as rep, 
         sum(t3.linetotal) - t2.discsum as total,
         t2.docdate as [docdate]

  from ocrd t0  
       inner join crd1 t1 on (t0.cardcode = t1.cardcode and t0.shiptodef = t1.address) 
       left outer join oinv t2 on t0.cardcode = t2.cardcode 
       left outer join inv1 t3 on t2.docentry = t3.docentry 
       left outer join oslp t4 on t2.slpcode = t4.slpcode

  where t0.[cardtype] = 'C' and 
            t1.adrestype = 'S' 

  group by t0.cardcode, t0.cardname, t1.city, t1.state, t1.country, t4.slpname, t2.discsum, t2.docdate

  union all

  select t0.cardcode as cardcode, 
         t0.cardname as customer, 
         t1.city as city,
         t1.state as state, 
         t1.country as country,
         t4.slpname as rep, 
         -1*(sum(t3.linetotal) - t2.discsum) as total,
         t2.docdate

  from ocrd t0  
      inner join crd1 t1 on (t0.cardcode = t1.cardcode and t0.shiptodef = t1.address) 
      left outer join orin t2 on t0.cardcode = t2.cardcode 
      left outer join rin1 t3 on t2.docentry = t3.docentry 
      left outer join oslp t4 on t2.slpcode = t4.slpcode

  where t0.[cardtype] = 'C' and 
          t1.adrestype = 'S' 

  group by t0.cardcode, 
           t0.cardname, 
           t1.city,
           t1.state, 
           t1.country,
           t4.slpname, 
           t2.discsum,
           t2.docdate) x 

where (x.docdate between '2008/01/01' and dateadd(day, -1, '2008/' + cast(month(getdate()) as varchar(2)) + '/01')
        or x.docdate between '2009/01/01' and dateadd(day, -1, '2009/' + cast(month(getdate()) as varchar(2)) + '/01')
          or x.docdate between '2010/01/01' and dateadd(day, -1, '2010/' + cast(month(getdate())  as varchar(2)) + '/01'))

group by x.cardcode, x.customer, x.rep, x.city, x.state, x.country, x.total, x.docdate

Now, I want to modify the query to return the top n, say 20, percent of customer's net total for each year. This is where I am having trouble. I am using SQL Server so first I thought I would try using row_number() over(partition.... but I haven't got it quite right (I know its not right because I can check it against the report I am reverse engineering). Here is my first attempt:

select m.Cardcode, m.Customer, m.Rep, m.City, m.State, m.Country, m.Nettotal as 'Net Total', m.docdate as 'Posting Date'
from (
  select t.cardcode, t.customer, t.rep, t.city, t.state, t.country, t.nettotal, t.docdate, row_number() over(partition by t.docdate order by t.nettotal desc) as rownum
  from (
      select x.cardcode, 
             x.customer,
             case 
               when x.rep is null then (select slpname from ocrd inner join oslp on ocrd.slpcode = oslp.slpcode where ocrd.cardcode = x.cardcode)
               else
                 x.rep
             end as rep, 
             x.city, 
             x.state, 
             x.country,
             case
               when isnumeric(x.total) = 0 then 0
               else x.total
             end as nettotal,
             x.docdate

      from (
        select t0.cardcode as cardcode, 
               t0.[cardname] as customer, 
               t1.city as city,
               t1.state as state,
               t1.country as country, 
               t4.slpname as rep, 
               sum(t3.linetotal) - t2.discsum as total,
               t2.docdate as docdate

        from ocrd t0  
             inner join crd1 t1 on (t0.cardcode = t1.cardcode and t0.shiptodef = t1.address) 
             left outer join oinv t2 on t0.cardcode = t2.cardcode 
             left outer join inv1 t3 on t2.docentry = t3.docentry 
             left outer join oslp t4 on t2.slpcode = t4.slpcode

        where t0.[cardtype] = 'C' and 
                  t1.adrestype = 'S' 

        group by t0.cardcode, 
                 t0.cardname, 
                 t1.city, 
                 t1.state, 
                 t1.country, 
                 t4.slpname, 
                 t2.discsum, 
                 t2.docdate

        union all

        select t0.cardcode as cardcode, 
               t0.cardname as customer, 
               t1.city as city,
               t1.country as country,
               t1.state as state, 
               t4.slpname as rep, 
               -1*(sum(t3.linetotal) - t2.discsum) as total,
               t2.docdate

        from ocrd t0  
            inner join crd1 t1 on (t0.cardcode = t1.cardcode and t0.shiptodef = t1.address) 
            left outer join orin t2 on t0.cardcode = t2.cardcode 
            left outer join rin1 t3 on t2.docentry = t3.docentry 
            left outer join oslp t4 on t2.slpcode = t4.slpcode

        where t0.[cardtype] = 'C' and 
                    t1.adrestype = 'S' 

        group by t0.cardcode, 
                 t0.cardname, 
                 t1.city,
                 t1.state, 
                 t1.country,
                 t4.slpname, 
                 t2.discsum,
                 t2.docdate) x 

  where (x.docdate between '2008/01/01' and dateadd(day, -1, '2008/' + cast(month(getdate()) as varchar(2)) + '/01')
            or x.docdate between '2009/01/01' and dateadd(day, -1, '2009/' + cast(month(getdate()) as varchar(2)) + '/01')
              or x.docdate between '2010/01/01' and dateadd(day, -1, '2010/' + cast(month(getdate())  as varchar(2)) + '/01'))

  group by x.cardcode, 
           x.customer, 
           x.rep, 
           x.city, 
           x.state, 
           x.country, 
           x.total, 
           x.docdate) as t
) as m

where rownum <= 20

Going down this road is troublesome even if I got it right because it doesn't allow me to get the top n percent, just the top n.

I have yet to try using cross apply or sub selects to achieve the result I desire.

Can someone help me get this right? Also, how it is written is probably not efficient and the hard coded date range selection is not a good solution. I guess there is a lot to improve :)

Your help is appreciated.

Upvotes: 2

Views: 1964

Answers (2)

Manfred Sorg
Manfred Sorg

Reputation: 1890

If you need a more individual percentage (let's say 17%) you can use row_number and count:

with cSalesPerYear as (
    select  s.Year, c.Customer,
            RankNo = rank() over (partition by s.Year order by S.Amount desc),
            RowNo = row_number() over (partition by s.Year order by S.Amount desc),
            CountOrders = count() over (partition by s.Year)
    from    dbo.Customers c
    inner join dbo.Sales s
        on  s.CustomerID = c.CustomerID
)
select  *
from    cSalesPerYear
where   RowNo <= @Percentage * CountOrders
     -- RankNo <= @Percentage * CountOrders --<-- "with ties" version

Upvotes: 2

marc_s
marc_s

Reputation: 755157

You can use one of the ranking functions in SQL Server - but not ROW_NUMBER() but NTILE() instead.

NTILE() will break up a result set into as many chunks of data that you specify - since you want to the top 20%, you'll probably use NTILE(5).

So your CTE should look something like this:

WITH CustomerPerYear AS
(
   SELECT 
       c.Name, s.Sales, 
       NTILE(5) OVER (PARTITION BY c.CustomerID ORDER BY s.Amount DESC) as 'NTile'
   FROM
       dbo.Customer c
   INNER JOIN
       dbo.Sales s ON s.CustomerID = c.CustomerID
)
SELECT *
FROM CustomerPerYear
WHERE NTile = 1

So basically you're partioning your data by customer, and then you're ranking each customers' sales into 5 NTile groups ordered by the sales amount. The NTILE = 1 is the top 20% of your sales, for each customer.

See the MSDN docs on NTILE for more details, if needed.

Upvotes: 1

Related Questions