Selecting top, grouping, and summing others with table variable

Question

I looked at several other questions to try to find an answer to this but I couldn't. Here's the thing, I have a REALLY big table that will keep growing indefinitely. When I say BIG I mean I have around 10 million rows for a query bounded for 6 hours of data. We have data for several months just so you can see how big it is.

Well, justified the size problem, I want to make a very simple query: Group by a column and sum the values of another column. Out of that I want the biggest 10 sums, for example, and the sum of all other not in the top 10. I know there are ways of doing this, but I would like to do it without having to compute the totals table twice. For that I used Table variables. I'm using SQL SERVER 2012.

DECLARE @sumsTable TABLE(operationName varchar(200), operationAmount int)
DECLARE @topTable TABLE(operationName varchar(200), operationAmount int)
DECLARE @startTime DATETIME
DECLARE @endTime DATETIME
DECLARE @top INTEGER

SET @top = 10
SET @endTime = '03/11/2013'
SET @startTime = '03/10/2013'

--grouping by operationName and summing occurences
INSERT INTO @sumsTable
SELECT operationName, COUNT(*) AS operationAmount
FROM [f6f87bf0-33ab-4882-8674-2cb31e5e49c4]
WHERE (TIMESTAMP >= @startTime) AND (TIMESTAMP <= @endTime)
GROUP BY operationName

--selecting top ocurrences
INSERT INTO @topTable
SELECT TOP(@top) * FROM @sumsTable
ORDER BY operationAmount DESC

--Summing others and making union with top
SELECT 'OTHER' AS operationName, SUM(operationAmount) as operationAmount FROM @sumsTable
WHERE operationName NOT IN (SELECT operationName FROM @topTable)
UNION
SELECT * FROM @topTable
ORDER BY operationAmount DESC

My question is fit this is a good way of doing it, if there are better ways, faster ways... am I committing any crimes? Can I get rid of the table variables without making all the summations more then once?

outcoldman · Accepted Answer

You can do it without temporary tables:

SET @top = 10
SET @endTime = '03/11/2013'
SET @startTime = '03/10/2013'

select 
      (case when y.RowID > @top then 'OTHER' else y.operationName end) as operationName,
      sum(y.operationAmount) as operationAmount
from
(
    select 
           row_number() over(order by count(*) desc) as RowID, 
           x.operationName, 
           count(*) AS operationAmount
    from [f6f87bf0-33ab-4882-8674-2cb31e5e49c4] as x
    where (TIMESTAMP >= @startTime) AND (TIMESTAMP <= @endTime)
    group by x.operationName
)
as y
group by (case when y.RowID > @top then 'OTHER' else y.operationName end)

Selecting top, grouping, and summing others with table variable

Answers (2)

Related Questions