Hugh
Hugh

Reputation: 2569

In SQL, how can you "group by" in ranges?

Suppose I have a table with a numeric column (lets call it "score").

I'd like to generate a table of counts, that shows how many times scores appeared in each range.

For example:

score range  | number of occurrences
-------------------------------------
   0-9       |        11
  10-19      |        14
  20-29      |         3
   ...       |       ...

In this example there were 11 rows with scores in the range of 0 to 9, 14 rows with scores in the range of 10 to 19, and 3 rows with scores in the range 20-29.

Is there an easy way to set this up? What do you recommend?

Upvotes: 230

Views: 246611

Answers (19)

Lukasz Szozda
Lukasz Szozda

Reputation: 175586

SQL Standard defines WIDTH_BUCKET( <expr> , <min_value> , <max_value> , <num_buckets>) function:

SELECT WIDTH_BUCKET(score, 0, 50, 5) AS bucket_num, COUNT(*)
FROM tab
GROUP BY WIDTH_BUCKET(score, 0, 50, 5)
ORDER BY bucket_num;

For input:

CREATE TABLE tab(score INT);

INSERT INTO tab(score) VALUES (1),(2),(9),(10),(11),(22),(23),(41);

Output:

bucket_num  count
1   3
2   2
3   2
5   1

Human-readable bucket range:

SELECT CONCAT((WIDTH_BUCKET(score, 0, 50, 5)-1)*10, '-', WIDTH_BUCKET(score, 0, 50, 5)*10-1) AS bucket_num,
       COUNT(*)
FROM tab
GROUP BY CONCAT((WIDTH_BUCKET(score, 0, 50, 5)-1)*10, '-', WIDTH_BUCKET(score, 0, 50, 5)*10-1)
ORDER BY bucket_num;

Output:

bucket_num  count
0-9     3
10-19   2
20-29   2
40-49   1

db<>fiddle demo


T612, Advanced OLAP operations

Transact-SQL partially supports this feature. Transact-SQL does not support the WIDTH_BUCKET, PERCENT_RANK, and CUME_DIST functions or the WINDOW and FILTER clauses.

Upvotes: 0

XAJA
XAJA

Reputation: 37

SELECT
  COUNT(*) AS number_of_occurances,
  FLOOR(scores / 10) * 10 AS scores_in_range
FROM ScoreTable
GROUP BY scores_in_range
ORDER BY scores_in_range DESC;

Upvotes: 0

mhawke
mhawke

Reputation: 87054

In postgres (where || is the string concatenation operator):

select (score/10)*10 || '-' || (score/10)*10+9 as scorerange, count(*)
from scores
group by score/10
order by 1

gives:

 scorerange | count 
------------+-------
 0-9        |    11
 10-19      |    14
 20-29      |     3
 30-39      |     2

And here's how to do it in T-SQL:

DECLARE @traunch INT = 1000;

SELECT 
    CONCAT
    ( 
      FORMAT((score / @traunch) * @traunch, '###,000,000') 
      , ' - ' , 
      FORMAT((score / @traunch) * @traunch + @traunch - 1, '###,000,000') 
    ) as [Range]
  , FORMAT(MIN(score), 'N0') as [Min]
  , FORMAT(AVG(score), 'N0') as [Avg]
  , FORMAT(MAX(score), 'N0') as [Max]
  , FORMAT(COUNT(score), 'N0') as [Count]
  , FORMAT(SUM(score), 'N0') as [Sum]
FROM scores
GROUP BY score / @traunch
ORDER BY score / @traunch

enter image description here

Upvotes: 31

Alex Punnen
Alex Punnen

Reputation: 6224

For PrestoSQL/Trino applying answer from Ken https://stackoverflow.com/a/232463/429476

select t.range, count(*) as "Number of Occurance", ROUND(AVG(fare_amount),2) as "Avg",
  ROUND(MAX(fare_amount),2) as "Max" ,ROUND(MIN(fare_amount),2) as "Min" 
from (
  select 
   case 
      when trip_distance between  0 and  9 then ' 0-9 '
      when trip_distance between 10 and 19 then '10-19'
      when trip_distance between 20 and 29 then '20-29'
      when trip_distance between 30 and 39 then '30-39'
      else '> 39' 
   end as range ,fare_amount 
  from nyc_in_parquet.tlc_yellow_trip_2022) t
  where fare_amount > 1 and fare_amount < 401092
group by t.range;

 range | Number of Occurance |  Avg   |  Max  | Min  
-------+---------------------+--------+-------+------
  0-9  |             2260865 |  10.28 | 720.0 | 1.11 
 30-39 |                1107 | 104.28 | 280.0 |  5.0 
 10-19 |              126136 |   43.8 | 413.5 |  2.0 
 > 39  |               42556 |  39.11 | 668.0 | 1.99 
 20-29 |               19133 |  58.62 | 250.0 |  2.5 

Upvotes: 0

April Rose Garcia
April Rose Garcia

Reputation: 1

I'm here because i have similar question but i find the short answers wrong and the one with the continuous "case when" is to much work and seeing anything repetitive in my code hurts my eyes. So here is the solution

SELECT --MIN(score), MAX(score),
    [score range] = CAST(ROUND(score-5,-1)AS VARCHAR) + ' - ' + CAST((ROUND(score-5,-1)+10)AS VARCHAR),
    [number of occurrences] = COUNT(*)
FROM order
GROUP BY  CAST(ROUND(score-5,-1)AS VARCHAR) + ' - ' + CAST((ROUND(score-5,-1)+10)AS VARCHAR)
ORDER BY MIN(score)


Upvotes: 0

user8494871
user8494871

Reputation: 21

select t.range as score, count(*) as Count 
from (
      select UserId,
         case when isnull(score ,0) >= 0 and isnull(score ,0)< 5 then '0-5'
                when isnull(score ,0) >= 5 and isnull(score ,0)< 10 then '5-10'
                when isnull(score ,0) >= 10 and isnull(score ,0)< 15 then '10-15'
                when isnull(score ,0) >= 15 and isnull(score ,0)< 20 then '15-20'               
         else ' 20+' end as range
         ,case when isnull(score ,0) >= 0 and isnull(score ,0)< 5 then 1
                when isnull(score ,0) >= 5 and isnull(score ,0)< 10 then 2
                when isnull(score ,0) >= 10 and isnull(score ,0)< 15 then 3
                when isnull(score ,0) >= 15 and isnull(score ,0)< 20 then 4             
         else 5  end as pd
     from score table
     ) t

group by t.range,pd order by pd

Upvotes: 0

Ron Tuffin
Ron Tuffin

Reputation: 54620

Neither of the highest voted answers are correct on SQL Server 2000. Perhaps they were using a different version.

Here are the correct versions of both of them on SQL Server 2000.

select t.range as [score range], count(*) as [number of occurences]
from (
  select case  
    when score between 0 and 9 then ' 0- 9'
    when score between 10 and 19 then '10-19'
    else '20-99' end as range
  from scores) t
group by t.range

or

select t.range as [score range], count(*) as [number of occurrences]
from (
      select user_id,
         case when score >= 0 and score< 10 then '0-9'
         when score >= 10 and score< 20 then '10-19'
         else '20-99' end as range
     from scores) t
group by t.range

Upvotes: 181

Stubo
Stubo

Reputation: 11

Try

SELECT (str(range) + "-" + str(range + 9) ) AS [Score range], COUNT(score) AS [number of occurances]
FROM (SELECT  score,  int(score / 10 ) * 10  AS range  FROM scoredata )  
GROUP BY range;

Upvotes: 1

trevorgrayson
trevorgrayson

Reputation: 1867

This will allow you to not have to specify ranges, and should be SQL server agnostic. Math FTW!

SELECT CONCAT(range,'-',range+9), COUNT(range)
FROM (
  SELECT 
    score - (score % 10) as range
  FROM scores
)

Upvotes: 6

JoshNaro
JoshNaro

Reputation: 2097

I would do this a little differently so that it scales without having to define every case:

select t.range as [score range], count(*) as [number of occurences]
from (
  select FLOOR(score/10) as range
  from scores) t
group by t.range

Not tested, but you get the idea...

Upvotes: 4

Kevin Hogg
Kevin Hogg

Reputation: 1781

Because the column being sorted on (Range) is a string, string/word sorting is used instead of numeric sorting.

As long as the strings have zeros to pad out the number lengths the sorting should still be semantically correct:

SELECT t.range AS ScoreRange,
       COUNT(*) AS NumberOfOccurrences
  FROM (SELECT CASE
                    WHEN score BETWEEN 0 AND 9 THEN '00-09'
                    WHEN score BETWEEN 10 AND 19 THEN '10-19'
                    ELSE '20-99'
               END AS Range
          FROM Scores) t
 GROUP BY t.Range

If the range is mixed, simply pad an extra zero:

SELECT t.range AS ScoreRange,
       COUNT(*) AS NumberOfOccurrences
  FROM (SELECT CASE
                    WHEN score BETWEEN 0 AND 9 THEN '000-009'
                    WHEN score BETWEEN 10 AND 19 THEN '010-019'
                    WHEN score BETWEEN 20 AND 99 THEN '020-099'
                    ELSE '100-999'
               END AS Range
          FROM Scores) t
 GROUP BY t.Range

Upvotes: 1

Danny Hui
Danny Hui

Reputation: 11

select t.blah as [score range], count(*) as [number of occurences]
from (
  select case 
    when score between  0 and  9 then ' 0-9 '
    when score between 10 and 19 then '10-19'
    when score between 20 and 29 then '20-29'
    ...
    else '90-99' end as blah
  from scores) t
group by t.blah

Make sure you use a word other than 'range' if you are in MySQL, or you will get an error for running the above example.

Upvotes: 1

Ken Paul
Ken Paul

Reputation: 5765

I see answers here that won't work in SQL Server's syntax. I would use:

select t.range as [score range], count(*) as [number of occurences]
from (
  select case 
    when score between  0 and  9 then ' 0-9 '
    when score between 10 and 19 then '10-19'
    when score between 20 and 29 then '20-29'
    ...
    else '90-99' end as range
  from scores) t
group by t.range

EDIT: see comments

Upvotes: 33

Walter Mitty
Walter Mitty

Reputation: 18940

An alternative approach would involve storing the ranges in a table, instead of embedding them in the query. You would end up with a table, call it Ranges, that looks like this:

LowerLimit   UpperLimit   Range 
0              9          '0-9'
10            19          '10-19'
20            29          '20-29'
30            39          '30-39'

And a query that looks like this:

Select
   Range as [Score Range],
   Count(*) as [Number of Occurences]
from
   Ranges r inner join Scores s on s.Score between r.LowerLimit and r.UpperLimit
group by Range

This does mean setting up a table, but it would be easy to maintain when the desired ranges change. No code changes necessary!

Upvotes: 45

Aheho
Aheho

Reputation: 12821

declare @RangeWidth int

set @RangeWidth = 10

select
   Floor(Score/@RangeWidth) as LowerBound,
   Floor(Score/@RangeWidth)+@RangeWidth as UpperBound,
   Count(*)
From
   ScoreTable
group by
   Floor(Score/@RangeWidth)

Upvotes: 2

Timothy Walters
Timothy Walters

Reputation: 16874

James Curran's answer was the most concise in my opinion, but the output wasn't correct. For SQL Server the simplest statement is as follows:

SELECT 
    [score range] = CAST((Score/10)*10 AS VARCHAR) + ' - ' + CAST((Score/10)*10+9 AS VARCHAR), 
    [number of occurrences] = COUNT(*)
FROM #Scores
GROUP BY Score/10
ORDER BY Score/10

This assumes a #Scores temporary table I used to test it, I just populated 100 rows with random number between 0 and 99.

Upvotes: 13

Richard T
Richard T

Reputation: 4665

Perhaps you're asking about keeping such things going...

Of course you'll invoke a full table scan for the queries and if the table containing the scores that need to be tallied (aggregations) is large you might want a better performing solution, you can create a secondary table and use rules, such as on insert - you might look into it.

Not all RDBMS engines have rules, though!

Upvotes: -2

James Curran
James Curran

Reputation: 103485

select cast(score/10 as varchar) + '-' + cast(score/10+9 as varchar), 
       count(*)
from scores
group by score/10

Upvotes: 6

tvanfosson
tvanfosson

Reputation: 532435

create table scores (
   user_id int,
   score int
)

select t.range as [score range], count(*) as [number of occurences]
from (
      select user_id,
         case when score >= 0 and score < 10 then '0-9'
         case when score >= 10 and score < 20 then '10-19'
         ...
         else '90-99' as range
     from scores) t
group by t.range

Upvotes: 7

Related Questions