Reputation: 125
I have a table with the following structure
year age score weight
------------------------------
2008 16 100 3
2008 25 150 2
2009 40 210 2
2009 22 50 3
2009 65 90 3
I need to find the frequency distribution of score by age.
Expected output is the distribution of score by age in %: (Not accurate output - for depiction only. the row totals 1)
0-50 51-100 101-150 151-200 201-250
2008 16-35 0.3 0.2 0.1 0.4 0
2008 36-40
2008 41-65
2008 66+
2008 All
2009 16-35
2009 36-40
2009 41-65
2009 66+
I came up with the below query say for one group
select
yearofsale,
sum(sum(case
when age between 16 and 35 AND score between 0 and 50
then 1
else 0
end) * weight) / sum(case
when age between 16 and 35 and score between 0 and 50
then weight
else 1 end)
from
table
group by
yearofsale
but I'm pretty sure there is an easier way to do this. Any idea?
Thanks, bee
Upvotes: 4
Views: 5784
Reputation: 81950
Here is a dynamic approach where you'll get the FULL Matrix.
The matrix tiers are stored in a table, and by adjusting the parameters you can change the axis and even the source.
Consider the following:
Declare @Source varchar(150)= 'YourTable'
Declare @KeyCol varchar(150)= 'Year'
Declare @YTier varchar(50) = 'Age'
Declare @YMeas varchar(50) = 'Age'
Declare @XTier varchar(50) = 'Score'
Declare @XMeas varchar(50) = 'Score'
Declare @SQL varchar(max) = '
;with cte1 as (
Select '+@KeyCol+'
,YSeq = max(Y.Seq)
,YTitle = max(Y.Title)
,XSeq = max(X.Seq)
,XTitle = max(X.Title)
,Value = sum(Weight)
From '+@Source+' A
Join Tier Y on (Y.Tier='''+@YTier+''' and A.'+@YMeas+' between Y.R1 and Y.R2)
Join Tier X on (X.Tier='''+@XTier+''' and A.'+@XMeas+' between X.R1 and X.R2)
Group By '+@KeyCol+',Y.Seq,X.Seq
Union All
Select '+@KeyCol+'
,YSeq = Y.Seq
,YTitle = Y.Title
,XSeq = X.Seq
,XTitle = X.Title
,Value = 0
From (Select Distinct '+@KeyCol+' from '+@Source+') A
Cross Join (Select Distinct Seq,Title From Tier where Tier='''+@YTier+''') Y
Cross Join (Select Distinct Seq,Title From Tier where Tier='''+@XTier+''') X )
, cte2 as (Select '+@KeyCol+',YSeq,RowTotal=sum(Value) from cte1 Group By '+@KeyCol+',YSeq)
, cte3 as (Select A.*
,PctRow = Format(case when B.RowTotal=0 then 0 else (A.Value*100.0)/B.RowTotal end,''#0.0'')
From cte1 A
Join cte2 B on A.'+@KeyCol+'=B.'+@KeyCol+' and A.YSeq=B.YSeq )
Select *
Into #Temp
From cte3
Declare @SQL2 varchar(max) = Stuff((Select '','' + QuoteName(Title) From Tier where Tier='''+@XTier+''' Order by Seq For XML Path('''')),1,1,'''')
Select @SQL2 = ''
Select ['+@KeyCol+'],[YTitle] as '+@YTier+','' + @SQL2 + ''
From (Select '+@KeyCol+',YSeq,YTitle,XTitle,PctRow=max(PctRow) from #Temp Group BY '+@KeyCol+',YSeq,YTitle,XTitle) A
Pivot (max(PctRow) For [XTitle] in ('' + @SQL2 + '') ) p''
Exec(@SQL2);
'
Exec(@SQL)
Returns
The tiers are stored in a general structure. This allows for multiple versions. The tier table looks like this:
Upvotes: 2
Reputation: 350212
You could first perform a sub query that adds the age group and score group information to each record.
Then you could add to that the total weight per year and age group.
After that you would do the calculation of the percentage per age and score group. And finally you pivot that result into the cross table:
select yearofsale, age_group, [0-50], [51-100], [101-150], [151-200], [201-250]
from (
select yearofsale, age_group, score_group, 100.0*sum(weight) / min(total_weight) as pct
from (
select *,
sum(weight) over (partition by yearofsale, age_group) as total_weight
from (
select *,
case when age >= 66 then '66+'
when age >= 41 then '41-65'
when age >= 36 then '36-40'
when age >= 16 then '16-35'
end as age_group,
case when score < 51 then '0-50'
when score < 101 then '51-100'
when score < 151 then '101-150'
when score < 201 then '151-200'
when score < 251 then '201-250'
end as score_group
from
table) as base
) as base2
group by yearofsale, age_group, score_group) as base3
pivot ( sum(pct)
for score_group in ([0-50], [51-100], [101-150], [151-200], [201-250])
) as pivotTable
The output for the sample data in the original question is:
yearofsale | age_group | 0-50 | 51-100 | 101-150 | 151-200 | 201-250
------------+-----------+--------+--------+---------+---------+---------
2008 | 16-35 | NULL | 60.00 | 40.00 | NULL | NULL
2009 | 16-35 | 100.00 | NULL | NULL | NULL | NULL
2009 | 36-40 | NULL | NULL | NULL | NULL | 100.00
2009 | 41-65 | NULL | 100.00 | NULL | NULL | NULL
Upvotes: 2