Reputation: 60731
I have a table:
+----+-------+------+
| id | times | year |
+----+-------+------+
| 5 | 2 | 2008 |
| 6 | 76 | 2008 |
| 2 | 43 | 2009 |
| 4 | 5 | 2009 |
| 1 | 3 | 2010 |
| 9 | 6 | 2010 |
| 7 | 444 | 2011 |
| 8 | 3 | 2011 |
| 3 | 65 | 2012 |
+----+-------+------+
I would like to create a pivot out of this table which buckets times
per year
:
+--------+------+------+------+------+------+
| | 2008 | 2009 | 2010 | 2011 | 2012 |
+--------+------+------+------+------+------+
| 0 | | | | | |
| 1-30 | 1 | 1 | 2 | 1 | |
| 31-60 | | 1 | | | |
| 61-90 | 1 | | | | 1 |
| 91-120 | | | | | |
| 121+ | | | | 1 | |
+--------+------+------+------+------+------+
how do i start to tackle this challenge with sql? thank you so much for your guidance.
Upvotes: 2
Views: 6475
Reputation: 247690
You can use the sql server PIVOT
function for this. If you know the all of the values for the years as well as the buckets then you can hard-code the query:
select *
from
(
select
case
when times = 0 then '0'
when times >= 1 and times <=30 then '1-30'
when times >= 31 and times <=60 then '31-60'
when times >= 61 and times <=90 then '61-90'
when times >= 91 and times <=120 then '91-120'
else '121+' end bucket,
year
from yourtable
) src
pivot
(
count(year)
for year in ([2008], [2009], [2010], [2011], [2012])
) piv;
If you don't have access to the PIVOT
function then you can use an aggregate function with a CASE
:
select bucket,
sum(case when year = 2008 then 1 else 0 end) [2008],
sum(case when year = 2009 then 1 else 0 end) [2009],
sum(case when year = 2010 then 1 else 0 end) [2010],
sum(case when year = 2011 then 1 else 0 end) [2011],
sum(case when year = 2012 then 1 else 0 end) [2012]
from
(
select
case
when times = 0 then '0'
when times >= 1 and times <=30 then '1-30'
when times >= 31 and times <=60 then '31-60'
when times >= 61 and times <=90 then '61-90'
when times >= 91 and times <=120 then '91-120'
else '121+' end bucket,
year
from yourtable
) src
group by bucket
If you need all of the buckets to be listed, then you will want to have the bucket ranges stored in either a table or using a CTE
query, then you can use the following:
;with buckets(startbucket, endbucket, rnk) as
(
select 0, 0, 1
union all
select 1, 30, 2
union all
select 31, 60, 3
union all
select 61, 90, 4
union all
select 91, 120, 5
union all
select 121, null, 6
)
select
case when startbucket = 0 then '0'
when endbucket is null then cast(startbucket as varchar(50)) + '+'
else cast(startbucket as varchar(50)) + '-'+cast(endbucket as varchar(50)) end buckets,
[2008], [2009], [2010], [2011], [2012]
from
(
select rnk,
year,
startbucket,
endbucket
from buckets b
left join yourtable t
on t.times >= b.startbucket and t.times <= coalesce(b.endbucket, 100000)
) src
pivot
(
count(year)
for year in ([2008], [2009], [2010], [2011], [2012])
) piv;
Result:
| BUCKETS | 2008 | 2009 | 2010 | 2011 | 2012 |
----------------------------------------------
| 0 | 0 | 0 | 0 | 0 | 0 |
| 1-30 | 1 | 1 | 2 | 1 | 0 |
| 31-60 | 0 | 1 | 0 | 0 | 0 |
| 61-90 | 1 | 0 | 0 | 0 | 1 |
| 91-120 | 0 | 0 | 0 | 0 | 0 |
| 121+ | 0 | 0 | 0 | 1 | 0 |
The above will work great if you have a known number of values (years) that you need to transpose. If you have an unknown number then you will want to implement dynamic sql, similar to this:
DECLARE @cols AS NVARCHAR(MAX),
@query AS NVARCHAR(MAX)
select @cols = STUFF((SELECT distinct ',' + QUOTENAME(year)
from yourtable
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set @query = 'with buckets(startbucket, endbucket, rnk) as
(
select 0, 0, 1
union all
select 1, 30, 2
union all
select 31, 60, 3
union all
select 61, 90, 4
union all
select 91, 120, 5
union all
select 121, null, 6
)
select
case when startbucket = 0 then ''0''
when endbucket is null then cast(startbucket as varchar(50)) + ''+''
else cast(startbucket as varchar(50)) + ''-''+cast(endbucket as varchar(50)) end buckets,
'+@cols+'
from
(
select rnk,
year,
startbucket, endbucket
from buckets b
left join yourtable t
on t.times >= b.startbucket and t.times <= coalesce(b.endbucket, 100000)
) src
pivot
(
count(year)
for year in ('+@cols+')
) piv;'
execute(@query)
The result will be the same for both the static (hard-coded) version and the dynamic version.
Upvotes: 8
Reputation: 707
Darn it! Bluefeet beat me to it. My attempt is similar but uses a table to configure the buckets.
CREATE TABLE Bucket
(
id int,
minbound int,
maxbound int
)
INSERT INTO Bucket VALUES(1, 0, 30)
,(2, 31, 60)
,(3, 61, 90)
,(4, 91, 120)
,(5, 121, null)
Then one can calculate the bucket for each record in a CTE like so....
;WITH RecordBucket
AS
(
SELECT
r.*,
b.id as bucketid
FROM
Record r
INNER JOIN Bucket b ON r.times BETWEEN b.minbound and ISNULL(b.maxbound, 20000000)
)
...and outer join back to the buckets for the final query to allow ordering and empty buckets to be included:
select
b.id as BucketId,
CASE
WHEN b.maxbound IS NULL THEN CONVERT(VARCHAR(16), b.minbound) + '+'
ELSE CONVERT(VARCHAR(16), b.minbound) + ' - ' + CONVERT(VARCHAR(16), b.maxbound)
END as BucketName,
[2008],[2009],[2010],[2011]
from
Bucket b
LEFT JOIN
(
SELECT
bucketid,
times,
year
from
RecordBucket
) rb
pivot (count(times) for year in ([2008],[2009],[2010],[2011]))
as pvt ON b.id = pvt.bucketid
order by
bucketid
Upvotes: 3