Reputation: 332
I want to select a top group of rows, based on the rank of an aggregate of one of the columns. As an example, lets say I have the following query:
SELECT CustomerID, ItemID, Price * Qty as Revenue
FROM InvoiceTable
WHERE InvoiceDate between @StartDate, @EndDate
Pretty straight forward here, a query that returns every line item of every invoice between a set of dates. Now, what I would like to do is limit this so that only the items belonging to the top 10 Customers, based on the sum of Revenue are returned. Is there a straight forward way to do this?
I can always dump the results of the above query into a temporary table, run a second pass to compute the sum for each customer and store that in another column of the temporary table, and then select using a RANK OVER clause, but this seems a little convoluted, and I was hoping there was a more straight forward way to accomplish my goal.
To clarify, the query above is very very simplified, and it is not going to be practical to pre-compute a list of the top 10 clients before actually generating the list (the actual query consists of a number of unions and other conditions), which is why I am looking to either limit it in the query itself, or as a post-computational step.
Upvotes: 0
Views: 2234
Reputation: 71
Your question is (sadly) really db-engine dependent. Since you tagged for Transact-SQL, I'll answer it there first:
T-SQL has the SELECT TOP N feature, that allows to get the first N rows, if it's coupled to an ORDER BY clause, it will first order and give you the TOP N of that. Transact also allows ordering by aggregated fields if you alias them. Your question ask a compound of two things: Items and "top buyers"; so we should tacle the buyers first since they are the filter:
SELECT TOP 10 CustomerID, SUM(Price * Qty) as SumRevenue
FROM InvoiceTable
WHERE InvoiceDate between @StartDate, @EndDate
GROUP BY CustomerID
ORDER BY SumRevenue DESC
Thing is, you wanted the item list of this Customers. I'm gonna take the liberty of guessing you want the sum revenue per item per customer here, so you now need to use this as a subquery to get your items. I won't get into the in vs inner join debate, so i'll just put an example:
SELECT r.CustomerID, r.ItemID, SUM(r.Price * r.Qty) ItemRevenue
FROM InvoiceTable
INNER JOIN (SELECT TOP 10 i.CustomerID, SUM(i.Price * i.Qty) as SumRevenue
FROM InvoiceTable i
WHERE i.InvoiceDate between @StartDate, @EndDate
GROUP BY i.CustomerID
ORDER BY SumRevenue DESC) s
ON r.CustomerID = s.CustomerID
WHERE r.InvoiceDate between @StartDate, @EndDate
GROUP BY r.CustomerID, r.ItemID
If you work with MYSQL, the TOP N idiom is replaced with the LIMIT N idiom, so it would result in:
SELECT r.CustomerID, r.ItemID, SUM(r.Price * r.Qty) ItemRevenue
FROM InvoiceTable
INNER JOIN (SELECT i.CustomerID, SUM(i.Price * i.Qty) as SumRevenue
FROM InvoiceTable i
WHERE i.InvoiceDate between @StartDate, @EndDate
GROUP BY i.CustomerID
ORDER BY SumRevenue DESC
LIMIT 10) s
ON r.CustomerID = s.CustomerID
WHERE r.InvoiceDate between @StartDate, @EndDate
GROUP BY r.CustomerID, r.ItemID
If you work with Oracle, the query is a bit harder since you don't have the TOP N idiom and you'll have to work with ROWNUM and subqueries, so I recommend an approach like this:
SELECT m.CustomerID, m.ItemID, m.ItemRevenue
(SELECT ROWNUM, r.CustomerID, r.ItemID, SUM(r.Price * r.Qty) ItemRevenue
FROM InvoiceTable
INNER JOIN (SELECT i.CustomerID, SUM(i.Price * i.Qty) as SumRevenue
FROM InvoiceTable i
WHERE i.InvoiceDate between @StartDate, @EndDate
GROUP BY i.CustomerID) s
ON r.CustomerID = s.CustomerID
WHERE r.InvoiceDate between @StartDate, @EndDate
GROUP BY r.CustomerID, r.ItemID
ORDER BY s.SumRevenue DESC) m
WHERE ROWNUM < 11
Upvotes: 2
Reputation: 1269553
You can do this using window functions:
select it.*
from (select it.*, dense_rank() over (order by TotalRevenue) as seqnum
from (select it.*, sum(Price*Qty) over (order by customerid) as TotalRevenue
from InvoiceTable it
where InvoiceDate between @StartDate, @EndDate
) it
) it
where seqnum <= 10;
Upvotes: 1
Reputation: 6771
Would a cte make it simple enough, based on the fact that you have a complex query?:
WITH cte
AS ( SELECT TOP 10
CustomerID ,
SUM(Price * Qty) AS TotalRevenue
FROM InvoiceTable
WHERE InvoiceDate BETWEEN @startdate AND @EndDate
GROUP BY CustomerID
ORDER BY Price * Qty DESC
)
SELECT it.CustomerID ,
it.ItemID ,
it.Price * it.Qty AS Revenue
FROM InvoiceTable it
INNER JOIN cte ON it.CustomerID = cte.CustomerID
WHERE it.InvoiceDate BETWEEN @StartDate AND @EndDate
Upvotes: 1