user10901
user10901

Reputation: 639

Create range bins from SQL Server table for histograms

I have the following table in SQL Server:

-----------------------------
ID       Age          Gender
1          30           F
2          35           M
3          32           M
4          18           F
5          21           F

What I need to do is to execute a query that will group the records in given ranges and count the occurences. The results need to be displayed later in a histogram chart (bar chart). I tried a query similar to the following:

SELECT 
count(CASE WHEN Age>= 10 AND Age < 20 THEN 1 END) AS '10 - 20',
count(CASE WHEN Age>= 21 AND Age < 30 THEN 1 END) AS '21 - 30',
count(CASE WHEN Age>= 31 AND Age < 35 THEN 1 END) AS '31 - 35',
count(CASE WHEN Age>= 36 AND Age < 40 THEN 1 END) AS '36 - 40',
FROM (SELECT Age FROM Attendees) AS AgeGroups

For the moment, this does the trick, but does not consider the gender column. It will result a single row which counts the frequency of every age group:

10-20     21-30     31-35     36-40
  0         22        21        13

If gender would be considered there should be two records shown, for each gender. I need to see something like:

Gender    10-20     21-30     31-35     36-40
  M         0         12        9         6
  F         0         10        12        7

How should I approach this problem?

Upvotes: 30

Views: 85120

Answers (3)

David Mendes
David Mendes

Reputation: 367

I recently came across a similar problem where I needed to look at several variables rather than just one, and my solution was to use a temporary table.

CREATE TABLE #bin (
    startRange int, 
    endRange int,
    agelabel varchar(10)
);
GO

INSERT INTO #bin (startRange, endRange, mylabel) VALUES (10, 20, '10-20')
INSERT INTO #bin (startRange, endRange, mylabel) VALUES (21, 30, '21-30')
INSERT INTO #bin (startRange, endRange, mylabel) VALUES (31, 35, '31-35')
INSERT INTO #bin (startRange, endRange, mylabel) VALUES (36, 40, '36-40')
GO

SELECT 
    b.agelabel as ageBracket, 
    a.Gender, 
    count(a.Gender) as total
FROM 
    Attendees a
INNER JOIN 
    #bin b on (a.Age >= b.startRange and a.Age <= b.EndRange)
GROUP BY
    b.agelabel, a.Gender

DROP TABLE #bin 
GO

Or Alternatively, and probably the better solution,

With table1 as
(
SELECT 
    CASE 
        WHEN Age >= 10 and Age <= 20 then '10-20'
        WHEN Age > 20 and Age <= 30 then '21-30'
        WHEN Age > 30 and Age <= 35 then '31-35'
        WHEN Age > 35 and Age <= 40 then '36-40'
        ELSE 'NA'
    End as ageBracket,
    Gender
FROM
    Attendees
)

SELECT
    ageBracket,
    Gender,
    Count(Gender),
FROM
    table1
GROUP BY
    ageBracket, Gender

Where the result would be :

AgeBracket Gender Total
10-20 M 0
10-20 F 0
21-30 M 12
21-30 F 10
31-35 M  9
31-35 F 12
36-40 M  6
36-40 F  7

You can use the first select statement to gather all the data of your choice, while using the second query to perform any necessary calculations.

I think these solutions might be a bit overkill for your problem, but as it was the only question I found concerning binning, hopefully it will be useful to others !

Upvotes: 3

lonniect
lonniect

Reputation: 21

The example above excludes values 20, 30, 35, & 40. There is more than one way to skin this cat, here is one:

SELECT 
Gender,
count(CASE WHEN Age> 9 AND Age <= 20 THEN 1 END) AS [10 - 20],
count(CASE WHEN Age> 20 AND Age <= 30 THEN 1 END) AS [21 - 30],
count(CASE WHEN Age> 30 AND Age <= 35 THEN 1 END) AS [31 - 35],
count(CASE WHEN Age> 35 AND Age <= 40 THEN 1 END) AS [36 - 40]
FROM Attendees AS AgeGroups
GROUP BY Gender

Upvotes: 2

Lamak
Lamak

Reputation: 70658

Simply add the Gender column to your SELECT and then do a GROUP BY.

SELECT 
Gender,
count(CASE WHEN Age>= 10 AND Age < 20 THEN 1 END) AS [10 - 20],
count(CASE WHEN Age>= 21 AND Age < 30 THEN 1 END) AS [21 - 30],
count(CASE WHEN Age>= 31 AND Age < 35 THEN 1 END) AS [31 - 35],
count(CASE WHEN Age>= 36 AND Age < 40 THEN 1 END) AS [36 - 40]
FROM Attendees AS AgeGroups
GROUP BY Gender

Upvotes: 41

Related Questions