Reputation: 2616
I have a table like below:
Region Country Manufacturer Brand Period Spend
R1 C1 M1 B1 2016 5
R1 C1 M1 B1 2017 10
R1 C1 M1 B1 2017 20
R1 C1 M1 B2 2016 15
R1 C1 M1 B3 2017 20
R1 C2 M1 B1 2017 5
R1 C2 M2 B4 2017 25
R1 C2 M2 B5 2017 30
R2 C3 M1 B1 2017 35
R2 C3 M2 B4 2017 40
R2 C3 M2 B5 2017 45
...
I wrote the query below to aggregate them:
SELECT [Region]
,[Country]
,[Manufacturer]
,[Brand]
,Period
,SUM([Spend]) AS [Spend]
FROM myTable
GROUP BY [Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]
ORDER BY 1,2,3,4
which yields something like below:
Region Country Manufacturer Brand Period Spend
R1 C1 M1 B1 2016 5
R1 C1 M1 B1 2017 30 -- this row is an aggregate from raw table above
R1 C1 M1 B2 2016 15
R1 C1 M1 B3 2017 20
R1 C2 M1 B1 2017 4 -- aggregated result
R1 C2 M2 B4 2017 25
R1 C2 M2 B5 2017 30
R2 C3 M2 B4 2017 40
R2 C3 M2 B5 2017 45
I'd like to add another column to the above table that shows the DISTINCT COUNT
of Brand
grouped by Region
,Country
,Manufacturer
and Period
. So the final table would become as follow:
Region Country Manufacturer Brand Period Spend UniqBrandCount
R1 C1 M1 B1 2016 5 2 -- two brands by R1, C1, M1 in 2016
R1 C1 M1 B1 2017 30 1
R1 C1 M1 B2 2016 15 2 -- same as first row's result
R1 C1 M1 B3 2017 20 1
R1 C2 M1 B1 2017 4 1
R1 C2 M2 B4 2017 25 2
R1 C2 M2 B5 2017 30 2
R2 C3 M2 B4 2017 40 2
R2 C3 M2 B5 2017 45 2
I know how to get to final result in three steps.
Run this query (Query #1):
SELECT [Region] ,[Country] ,[Manufacturer] ,[Period] ,COUNT(DISTINCT [Brand]) AS [BrandCount] INTO Temp1 FROM myTable GROUP BY [Region] ,[Country] ,[Manufacturer] ,[Period]
Run this query (Query #2)
SELECT [Region] ,[Country] ,[Manufacturer] ,[Brand] ,YEAR([Period]) AS Period ,SUM([Spend]) AS [Spend] INTO Temp2 FROM myTable GROUP BY [Region] ,[Country] ,[Manufacturer] ,[Brand] ,[Period]
Then LEFT JOIN
Temp2
and Temp1
to bring in [BrandCount]
from the latter like below:
SELECT a.* ,b.* FROM Temp2 AS a LEFT JOIN Temp1 AS b ON a.[Region] = b.[Region] AND a.[Country] = b.[Country] AND a.[Advertiser] = b.[Advertiser] AND a.[Period] = b.[Period]
I'm pretty sure there is a more efficient way to do this, is there? Thank you in advance for your suggestions/answers!
Upvotes: 2
Views: 455
Reputation: 5452
Borrowing heavily from this question: https://dba.stackexchange.com/questions/89031/using-distinct-in-window-function-with-over
Count Distinct doesn't work, so dense_rank is required. Ranking the brands in forward and then reverse order, and then subtracting 1 gives the distinct count.
Your sum function can also be rewritten using PARTITION BY
logic. This way you can use different grouping levels for each aggregation:
SELECT
[Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]
,dense_rank() OVER
(PARTITION BY
[Region]
,[Country]
,[Manufacturer]
,[Period] Order by Brand)
+ dense_rank() OVER
(PARTITION BY
[Region]
,[Country]
,[Manufacturer]
,[Period] Order by Brand Desc)
- 1
AS [BrandCount]
,SUM([Spend]) OVER
(PARTITION BY
[Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]) as [Spend]
from
myTable
ORDER BY 1,2,3,4
You may then need to reduce the number of rows in your output, as this syntax gives the same number of rows as myTable, but with the aggregation totals appearing on each row they apply to:
R1 C1 M1 B1 2016 2 5
R1 C1 M1 B1 2017 2 30 --dup1
R1 C1 M1 B1 2017 2 30 --dup1
R1 C1 M1 B2 2016 2 15
R1 C1 M1 B3 2017 2 20
R1 C2 M1 B1 2017 1 5
R1 C2 M2 B4 2017 2 25
R1 C2 M2 B5 2017 2 30
R2 C3 M1 B1 2017 1 35
R2 C3 M2 B4 2017 2 40
R2 C3 M2 B5 2017 2 45
Selecting distinct rows from this output gives you what you need.
Consider this data:
Col1 Col2
B 1
B 1
B 3
B 5
B 7
B 9
dense_rank() ranks data according to the number of distinct items before the current one, plus 1. So:
1->1, 3->2, 5->3, 7->4, 9->5.
In reverse order (using desc
) this yields the reverse pattern:
1->5, 3->4, 5->3, 7->2, 9->1:
Adding these ranks together gives the same value:
1+5 = 2+4 = 3+3 = 4+2 = 5+1 = 6
The wording is helpful here,
(number of distinct items before + 1) + (number of distinct items after + 1)
= number of distinct OTHER items before AND after + 2
= Total number of distinct items + 1
So to get the total number of distinct items, add the ascending
and descending
dense_ranks together and subtract 1.
Upvotes: 2
Reputation: 453736
The double dense_rank
idea means that you need two sorts (assuming no index exists that provides sort order). Assuming no NULL
brands (as that idea does) you can use a single dense_rank
and a windowed MAX
as below (demo)
WITH T1
AS (SELECT *,
DENSE_RANK() OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period] ORDER BY Brand) AS [dr]
FROM myTable),
T2
AS (SELECT *,
MAX([dr]) OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period]) AS UniqBrandCount
FROM T1)
SELECT [Region],
[Country],
[Manufacturer],
[Brand],
Period,
SUM([Spend]) AS [Spend],
MAX(UniqBrandCount) AS UniqBrandCount
FROM T2
GROUP BY [Region],
[Country],
[Manufacturer],
[Brand],
[Period]
ORDER BY [Region],
[Country],
[Manufacturer],
[Period],
Brand
The above has some inevitable spooling (it isn't possible to do this in a 100% streaming manner) but a single sort.
Strangely the final order by clause is needed to keep the number of sorts down to one (or zero if a suitable index exists).
Upvotes: 1
Reputation: 175
The tag to your question;
window-functions
suggests you have a pretty good idea.
For DISTINCT COUNT of Brand grouped by Region,Country,Manufacturer and Period: you may write:
Select Region
,Country
,Manufacturer
,Brand
,Period
,Spend
,DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand asc)
+ DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand desc)
-1 UniqBrandCount
From myTable T1
Order By 1,2,3,4
Upvotes: 2