Alex Gordon
Alex Gordon

Reputation: 60731

doing a simple pivot on year

I have a table:

+----+-------+------+
| id | times | year |
+----+-------+------+
|  5 |     2 | 2008 |
|  6 |    76 | 2008 |
|  2 |    43 | 2009 |
|  4 |     5 | 2009 |
|  1 |     3 | 2010 |
|  9 |     6 | 2010 |
|  7 |   444 | 2011 |
|  8 |     3 | 2011 |
|  3 |    65 | 2012 |
+----+-------+------+

I would like to create a pivot out of this table which buckets times per year :

+--------+------+------+------+------+------+
|        | 2008 | 2009 | 2010 | 2011 | 2012 |
+--------+------+------+------+------+------+
| 0      |      |      |      |      |      |
| 1-30   |    1 |    1 |    2 |    1 |      |
| 31-60  |      |    1 |      |      |      |
| 61-90  |    1 |      |      |      |    1 |
| 91-120 |      |      |      |      |      |
| 121+   |      |      |      |    1 |      |
+--------+------+------+------+------+------+

how do i start to tackle this challenge with sql? thank you so much for your guidance.

Upvotes: 2

Views: 6475

Answers (2)

Taryn
Taryn

Reputation: 247690

You can use the sql server PIVOT function for this. If you know the all of the values for the years as well as the buckets then you can hard-code the query:

select *
from
(
  select 
    case 
      when times = 0 then '0' 
      when times >= 1 and times <=30 then '1-30'
      when times >= 31 and times <=60 then '31-60'  
      when times >= 61 and times <=90 then '61-90' 
      when times >= 91 and times <=120 then '91-120' 
      else '121+' end bucket,
    year
  from yourtable
) src
pivot
(
  count(year)
  for year in ([2008], [2009], [2010], [2011], [2012])
) piv;

See SQL Fiddle with Demo

If you don't have access to the PIVOT function then you can use an aggregate function with a CASE:

select bucket,
  sum(case when year = 2008 then 1 else 0 end) [2008],
  sum(case when year = 2009 then 1 else 0 end) [2009],
  sum(case when year = 2010 then 1 else 0 end) [2010],
  sum(case when year = 2011 then 1 else 0 end) [2011],
  sum(case when year = 2012 then 1 else 0 end) [2012]
from
(
  select 
    case 
      when times = 0 then '0' 
      when times >= 1 and times <=30 then '1-30' 
      when times >= 31 and times <=60 then '31-60'  
      when times >= 61 and times <=90 then '61-90' 
      when times >= 91 and times <=120 then '91-120' 
      else '121+' end bucket,
    year
  from yourtable
) src
group by bucket

See SQL Fiddle with Demo

If you need all of the buckets to be listed, then you will want to have the bucket ranges stored in either a table or using a CTE query, then you can use the following:

;with buckets(startbucket, endbucket, rnk) as
(
  select 0, 0, 1 
  union all
  select 1, 30, 2
  union all
  select 31, 60, 3
  union all
  select 61, 90, 4
  union all
  select 91, 120, 5
  union all
  select 121, null, 6
)
select 
  case when startbucket = 0 then '0'
    when endbucket is null then cast(startbucket as varchar(50)) + '+'
    else cast(startbucket as varchar(50)) + '-'+cast(endbucket as varchar(50)) end buckets,
  [2008], [2009], [2010], [2011], [2012]
from
(
  select rnk,
    year, 
    startbucket, 
    endbucket
  from buckets b
  left join yourtable t
    on t.times >= b.startbucket and t.times <= coalesce(b.endbucket, 100000)
) src
pivot
(
  count(year)
  for year in ([2008], [2009], [2010], [2011], [2012])
) piv;

See SQL Fiddle with Demo

Result:

| BUCKETS | 2008 | 2009 | 2010 | 2011 | 2012 |
----------------------------------------------
|       0 |    0 |    0 |    0 |    0 |    0 |
|    1-30 |    1 |    1 |    2 |    1 |    0 |
|   31-60 |    0 |    1 |    0 |    0 |    0 |
|   61-90 |    1 |    0 |    0 |    0 |    1 |
|  91-120 |    0 |    0 |    0 |    0 |    0 |
|    121+ |    0 |    0 |    0 |    1 |    0 |

The above will work great if you have a known number of values (years) that you need to transpose. If you have an unknown number then you will want to implement dynamic sql, similar to this:

DECLARE @cols AS NVARCHAR(MAX),
    @query  AS NVARCHAR(MAX)

select @cols = STUFF((SELECT distinct ',' + QUOTENAME(year) 
                    from yourtable
            FOR XML PATH(''), TYPE
            ).value('.', 'NVARCHAR(MAX)') 
        ,1,1,'')

set @query = 'with buckets(startbucket, endbucket, rnk) as
              (
                select 0, 0, 1 
                union all
                select 1, 30, 2
                union all
                select 31, 60, 3
                union all
                select 61, 90, 4
                union all
                select 91, 120, 5
                union all
                select 121, null, 6
              )
              select 
                case when startbucket = 0 then ''0''
                  when endbucket is null then cast(startbucket as varchar(50)) + ''+''
                  else cast(startbucket as varchar(50)) + ''-''+cast(endbucket as varchar(50)) end buckets,
                '+@cols+'
              from
              (
                select rnk,
                  year, 
                  startbucket, endbucket
                from buckets b
                left join yourtable t
                  on t.times >= b.startbucket and t.times <= coalesce(b.endbucket, 100000)
              ) src
              pivot
              (
                count(year)
                for year in ('+@cols+')
              ) piv;'

execute(@query)

See SQL Fiddle with Demo

The result will be the same for both the static (hard-coded) version and the dynamic version.

Upvotes: 8

Tim S
Tim S

Reputation: 707

Darn it! Bluefeet beat me to it. My attempt is similar but uses a table to configure the buckets.

CREATE TABLE Bucket
(
    id int,
    minbound int,
    maxbound int
)

INSERT INTO Bucket VALUES(1, 0, 30)
                    ,(2, 31, 60)
                    ,(3, 61, 90)
                    ,(4, 91, 120)
                    ,(5, 121, null)

Then one can calculate the bucket for each record in a CTE like so....

;WITH RecordBucket
AS
(
    SELECT
        r.*,
        b.id as bucketid
    FROM
        Record r
        INNER JOIN Bucket b ON r.times BETWEEN b.minbound and ISNULL(b.maxbound, 20000000)
)

...and outer join back to the buckets for the final query to allow ordering and empty buckets to be included:

select 
    b.id as BucketId, 
    CASE
        WHEN b.maxbound IS NULL THEN CONVERT(VARCHAR(16), b.minbound) + '+'
        ELSE CONVERT(VARCHAR(16), b.minbound) + ' - ' + CONVERT(VARCHAR(16), b.maxbound)
    END as BucketName,
    [2008],[2009],[2010],[2011] 
from 
    Bucket b
    LEFT JOIN
    (
        SELECT
            bucketid,
            times,
            year
        from
            RecordBucket
    ) rb 
    pivot (count(times) for year in ([2008],[2009],[2010],[2011])) 
    as pvt ON b.id = pvt.bucketid
order by
    bucketid

Upvotes: 3

Related Questions