Umar.H
Umar.H

Reputation: 23099

SQL to Group By top n occurring SKUs

I have a table like so called billing

sku_code | invoice_id | sku_amount | sku_sale
integer  | integer    | float      | float

What I want do is first find the top 350 sku_code by sku_amount

SELECT TOP 350 Sum(sku_amount) AS amt, 
       sku_code
FROM   billing 
GROUP  BY sku_code
ORDER  BY amt DESC 

I then want to slice the entire table by the table by the 350 sku_code that appear above

I assume this would be some sort of inner-join via a sub-query but I can't figure out the syntax.

The query I want to finally run is but with it only returning the 350 sku_codes from the above query.

SELECT sum(sku_amount) as amt,sku_code, invoice_id
from billing 

group by sku_code, invoice_id

order by amt DESC

This is giving me a table that will look like which should have around 20-30 million rows.

amt | sku_code | invoice_id

Upvotes: 1

Views: 250

Answers (4)

Ullas
Ullas

Reputation: 11556

You can achieve this by giving a row_number for the rows within CTE and use the rows having the row_number less than or equal to 350.

Query

;with cte as(
  select [rn] = row_number() over(
    partition by [sku_code]
    order by [amount] desc
  ), *
  from [bill]
)
select sum([sku_amount]) as [amount]
from cte
where [rn] <= 350;

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270503

If I understand correctly, you want the original rows for the top 350 skus. A JOIN should suffice:

SELECT b.*
FROM billing b JOIN
     (SELECT TOP 350 Sum(sku_amount) AS amt, 
             sku_code
      FROM billing 
      GROUP BY sku_code
      ORDER BY amt DESC 
     ) s
     ON s.sku_code = b.sku_code

Upvotes: 1

Barbaros &#214;zhan
Barbaros &#214;zhan

Reputation: 65363

If your DB version is 2012+, then use

OFFSET { integer_constant | offset_row_count_expression } { ROW | ROWS }
    [
      FETCH { FIRST | NEXT } {integer_constant | fetch_row_count_expression } { ROW | ROWS } ONLY
    ]

syntax as

SELECT SUM(sku_amount) AS amt, sku_code, invoice_id
  FROM billing 
 GROUP BY sku_code, invoice_id  
 ORDER BY ROW_NUMBER() OVER (ORDER BY SUM(sku_amount) DESC)
 OFFSET 0 ROWS
 FETCH NEXT 350 ROWS ONLY

by using window analytic function ( row_number() in this case ) in the order by clause without need of a subquery

Demo

Upvotes: 1

Error_2646
Error_2646

Reputation: 3801

If I'm understanding right you can just use rank. This will give you all records which have a sku_amount in the top 350

SELECT * 
  FROM (
    SELECT billing.*,
           RANK() OVER ( PARTITION BY 1 ORDER BY sku_amount DESC ) rnk
    FROM   billing 
    ) TMP
 WHERE rnk <= 350

Upvotes: 1

Related Questions