Reputation: 752
I looked at several other questions to try to find an answer to this but I couldn't. Here's the thing, I have a REALLY big table that will keep growing indefinitely. When I say BIG I mean I have around 10 million rows for a query bounded for 6 hours of data. We have data for several months just so you can see how big it is.
Well, justified the size problem, I want to make a very simple query: Group by a column and sum the values of another column. Out of that I want the biggest 10 sums, for example, and the sum of all other not in the top 10. I know there are ways of doing this, but I would like to do it without having to compute the totals table twice. For that I used Table variables. I'm using SQL SERVER 2012.
DECLARE @sumsTable TABLE(operationName varchar(200), operationAmount int)
DECLARE @topTable TABLE(operationName varchar(200), operationAmount int)
DECLARE @startTime DATETIME
DECLARE @endTime DATETIME
DECLARE @top INTEGER
SET @top = 10
SET @endTime = '03/11/2013'
SET @startTime = '03/10/2013'
--grouping by operationName and summing occurences
INSERT INTO @sumsTable
SELECT operationName, COUNT(*) AS operationAmount
FROM [f6f87bf0-33ab-4882-8674-2cb31e5e49c4]
WHERE (TIMESTAMP >= @startTime) AND (TIMESTAMP <= @endTime)
GROUP BY operationName
--selecting top ocurrences
INSERT INTO @topTable
SELECT TOP(@top) * FROM @sumsTable
ORDER BY operationAmount DESC
--Summing others and making union with top
SELECT 'OTHER' AS operationName, SUM(operationAmount) as operationAmount FROM @sumsTable
WHERE operationName NOT IN (SELECT operationName FROM @topTable)
UNION
SELECT * FROM @topTable
ORDER BY operationAmount DESC
My question is fit this is a good way of doing it, if there are better ways, faster ways... am I committing any crimes? Can I get rid of the table variables without making all the summations more then once?
Upvotes: 1
Views: 1263
Reputation: 1001
with following sql, you only need aggregate the raw table once
rather than
row_number() over(order by count(*) desc) as RowID, x.operationName, count(*) AS operationAmount
which does count(*) twice
DECLARE @startTime DATETIME
DECLARE @endTime DATETIME
DECLARE @top INTEGER
SET @endTime = '03/11/2013'
SET @startTime = '03/10/2013'
;WITH cte AS ( -- get sum for all operations
SELECT operationName, COUNT(*) AS operationAmount
FROM [f6f87bf0-33ab-4882-8674-2cb31e5e49c4]
WHERE (TIMESTAMP >= @startTime) AND (TIMESTAMP <= @endTime)
GROUP BY operationName
),
cte1 AS ( -- rank totals
SELECT operationName, operationAmount, ROW_NUMBER()OVER (ORDER BY operationAmount DESC) AS RN
FROM cte
) -- get top 10 and others
SELECT (CASE WHEN RN < 10 THEN operationName ELSE 'Others' END) Name, SUM(operationAmount)
FROM cte1
GROUP BY (CASE WHEN RN < 10 THEN operationName ELSE 'Others' END)
Upvotes: 0
Reputation: 11832
You can do it without temporary tables:
SET @top = 10
SET @endTime = '03/11/2013'
SET @startTime = '03/10/2013'
select
(case when y.RowID > @top then 'OTHER' else y.operationName end) as operationName,
sum(y.operationAmount) as operationAmount
from
(
select
row_number() over(order by count(*) desc) as RowID,
x.operationName,
count(*) AS operationAmount
from [f6f87bf0-33ab-4882-8674-2cb31e5e49c4] as x
where (TIMESTAMP >= @startTime) AND (TIMESTAMP <= @endTime)
group by x.operationName
)
as y
group by (case when y.RowID > @top then 'OTHER' else y.operationName end)
Upvotes: 2