Bee
Bee

Reputation: 125

Frequency distribution by column in SQL Server

I have a table with the following structure

  year    age   score    weight
  ------------------------------
  2008    16    100      3
  2008    25    150      2
  2009    40    210      2
  2009    22    50       3
  2009    65    90       3

I need to find the frequency distribution of score by age.

Expected output is the distribution of score by age in %: (Not accurate output - for depiction only. the row totals 1)

                   0-50  51-100  101-150  151-200  201-250
 2008   16-35      0.3    0.2      0.1     0.4      0
 2008   36-40 
 2008   41-65
 2008   66+
 2008   All
 2009   16-35
 2009   36-40
 2009   41-65
 2009   66+

I came up with the below query say for one group

select 
    yearofsale,
    sum(sum(case 
               when age between 16 and 35 AND score between 0 and 50 
                  then 1 
               else 0 
            end) * weight) / sum(case 
                                    when age between 16 and 35 and score between 0 and 50 
                                       then weight 
                                    else 1 end)
from 
    table
group by 
    yearofsale

but I'm pretty sure there is an easier way to do this. Any idea?

Thanks, bee

Upvotes: 4

Views: 5784

Answers (2)

John Cappelletti
John Cappelletti

Reputation: 81950

Here is a dynamic approach where you'll get the FULL Matrix.

The matrix tiers are stored in a table, and by adjusting the parameters you can change the axis and even the source.

Consider the following:

Declare @Source varchar(150)= 'YourTable'
Declare @KeyCol varchar(150)= 'Year'
Declare @YTier  varchar(50) = 'Age'
Declare @YMeas  varchar(50) = 'Age'
Declare @XTier  varchar(50) = 'Score'
Declare @XMeas  varchar(50) = 'Score'


Declare @SQL varchar(max) = '
;with cte1 as (
      Select '+@KeyCol+'
            ,YSeq   = max(Y.Seq)
            ,YTitle = max(Y.Title)
            ,XSeq   = max(X.Seq)
            ,XTitle = max(X.Title)
            ,Value  = sum(Weight) 
       From  '+@Source+' A
       Join  Tier Y on (Y.Tier='''+@YTier+''' and A.'+@YMeas+'   between Y.R1 and Y.R2)
       Join  Tier X on (X.Tier='''+@XTier+''' and A.'+@XMeas+' between X.R1 and X.R2)
       Group By '+@KeyCol+',Y.Seq,X.Seq
       Union All
       Select '+@KeyCol+'
             ,YSeq   = Y.Seq
             ,YTitle = Y.Title
             ,XSeq   = X.Seq
             ,XTitle = X.Title
             ,Value  = 0
       From  (Select Distinct '+@KeyCol+' from '+@Source+') A
       Cross Join (Select Distinct Seq,Title From Tier where Tier='''+@YTier+''') Y
       Cross Join (Select Distinct Seq,Title From Tier where Tier='''+@XTier+''') X )
, cte2 as (Select '+@KeyCol+',YSeq,RowTotal=sum(Value) from cte1 Group By '+@KeyCol+',YSeq)
, cte3 as (Select A.*
                 ,PctRow = Format(case when B.RowTotal=0 then 0 else (A.Value*100.0)/B.RowTotal end,''#0.0'')
           From   cte1 A
           Join   cte2 B on A.'+@KeyCol+'=B.'+@KeyCol+' and A.YSeq=B.YSeq )
 Select * 
  Into #Temp 
  From cte3

Declare @SQL2 varchar(max) = Stuff((Select '','' + QuoteName(Title) From Tier where Tier='''+@XTier+''' Order by Seq For XML Path('''')),1,1,'''') 
Select  @SQL2 = ''
Select ['+@KeyCol+'],[YTitle] as '+@YTier+','' + @SQL2 + ''
 From  (Select '+@KeyCol+',YSeq,YTitle,XTitle,PctRow=max(PctRow) from #Temp Group BY '+@KeyCol+',YSeq,YTitle,XTitle) A
 Pivot (max(PctRow) For [XTitle] in ('' + @SQL2 + '') ) p''
Exec(@SQL2);
'
Exec(@SQL)

Returns

enter image description here

The tiers are stored in a general structure. This allows for multiple versions. The tier table looks like this:

enter image description here

Upvotes: 2

trincot
trincot

Reputation: 350212

You could first perform a sub query that adds the age group and score group information to each record.

Then you could add to that the total weight per year and age group.

After that you would do the calculation of the percentage per age and score group. And finally you pivot that result into the cross table:

select yearofsale, age_group, [0-50], [51-100], [101-150], [151-200], [201-250]
from (
    select yearofsale, age_group, score_group, 100.0*sum(weight) / min(total_weight) as pct
    from (
            select *, 
                   sum(weight) over (partition by yearofsale, age_group) as total_weight
            from (
                    select *,
                        case when age >= 66 then '66+'
                             when age >= 41 then '41-65'
                             when age >= 36 then '36-40'
                             when age >= 16 then '16-35' 
                        end as age_group,
                        case when score <  51 then '0-50'
                             when score < 101 then '51-100' 
                             when score < 151 then '101-150' 
                             when score < 201 then '151-200' 
                             when score < 251 then '201-250'
                        end as score_group
                    from
                        table) as base
            ) as base2
    group by yearofsale, age_group, score_group) as base3
pivot ( sum(pct)
    for score_group in ([0-50], [51-100], [101-150], [151-200], [201-250])
) as pivotTable

The output for the sample data in the original question is:

 yearofsale | age_group |  0-50  | 51-100 | 101-150 | 151-200 | 201-250
------------+-----------+--------+--------+---------+---------+---------
    2008    |   16-35   |  NULL  |  60.00 |   40.00 |   NULL  |   NULL
    2009    |   16-35   | 100.00 |  NULL  |   NULL  |   NULL  |   NULL
    2009    |   36-40   |  NULL  |  NULL  |   NULL  |   NULL  |  100.00
    2009    |   41-65   |  NULL  | 100.00 |   NULL  |   NULL  |   NULL

Upvotes: 2

Related Questions