Reputation: 2569
Suppose I have a table with a numeric column (lets call it "score").
I'd like to generate a table of counts, that shows how many times scores appeared in each range.
For example:
score range | number of occurrences ------------------------------------- 0-9 | 11 10-19 | 14 20-29 | 3 ... | ...
In this example there were 11 rows with scores in the range of 0 to 9, 14 rows with scores in the range of 10 to 19, and 3 rows with scores in the range 20-29.
Is there an easy way to set this up? What do you recommend?
Upvotes: 230
Views: 246611
Reputation: 175586
SQL Standard defines WIDTH_BUCKET( <expr> , <min_value> , <max_value> , <num_buckets>)
function:
SELECT WIDTH_BUCKET(score, 0, 50, 5) AS bucket_num, COUNT(*)
FROM tab
GROUP BY WIDTH_BUCKET(score, 0, 50, 5)
ORDER BY bucket_num;
For input:
CREATE TABLE tab(score INT);
INSERT INTO tab(score) VALUES (1),(2),(9),(10),(11),(22),(23),(41);
Output:
bucket_num count
1 3
2 2
3 2
5 1
Human-readable bucket range:
SELECT CONCAT((WIDTH_BUCKET(score, 0, 50, 5)-1)*10, '-', WIDTH_BUCKET(score, 0, 50, 5)*10-1) AS bucket_num,
COUNT(*)
FROM tab
GROUP BY CONCAT((WIDTH_BUCKET(score, 0, 50, 5)-1)*10, '-', WIDTH_BUCKET(score, 0, 50, 5)*10-1)
ORDER BY bucket_num;
Output:
bucket_num count
0-9 3
10-19 2
20-29 2
40-49 1
T612, Advanced OLAP operations
Transact-SQL partially supports this feature. Transact-SQL does not support the WIDTH_BUCKET, PERCENT_RANK, and CUME_DIST functions or the WINDOW and FILTER clauses.
Upvotes: 0
Reputation: 37
SELECT
COUNT(*) AS number_of_occurances,
FLOOR(scores / 10) * 10 AS scores_in_range
FROM ScoreTable
GROUP BY scores_in_range
ORDER BY scores_in_range DESC;
Upvotes: 0
Reputation: 87054
In postgres (where ||
is the string concatenation operator):
select (score/10)*10 || '-' || (score/10)*10+9 as scorerange, count(*)
from scores
group by score/10
order by 1
gives:
scorerange | count
------------+-------
0-9 | 11
10-19 | 14
20-29 | 3
30-39 | 2
DECLARE @traunch INT = 1000;
SELECT
CONCAT
(
FORMAT((score / @traunch) * @traunch, '###,000,000')
, ' - ' ,
FORMAT((score / @traunch) * @traunch + @traunch - 1, '###,000,000')
) as [Range]
, FORMAT(MIN(score), 'N0') as [Min]
, FORMAT(AVG(score), 'N0') as [Avg]
, FORMAT(MAX(score), 'N0') as [Max]
, FORMAT(COUNT(score), 'N0') as [Count]
, FORMAT(SUM(score), 'N0') as [Sum]
FROM scores
GROUP BY score / @traunch
ORDER BY score / @traunch
Upvotes: 31
Reputation: 6224
For PrestoSQL/Trino applying answer from Ken https://stackoverflow.com/a/232463/429476
select t.range, count(*) as "Number of Occurance", ROUND(AVG(fare_amount),2) as "Avg",
ROUND(MAX(fare_amount),2) as "Max" ,ROUND(MIN(fare_amount),2) as "Min"
from (
select
case
when trip_distance between 0 and 9 then ' 0-9 '
when trip_distance between 10 and 19 then '10-19'
when trip_distance between 20 and 29 then '20-29'
when trip_distance between 30 and 39 then '30-39'
else '> 39'
end as range ,fare_amount
from nyc_in_parquet.tlc_yellow_trip_2022) t
where fare_amount > 1 and fare_amount < 401092
group by t.range;
range | Number of Occurance | Avg | Max | Min
-------+---------------------+--------+-------+------
0-9 | 2260865 | 10.28 | 720.0 | 1.11
30-39 | 1107 | 104.28 | 280.0 | 5.0
10-19 | 126136 | 43.8 | 413.5 | 2.0
> 39 | 42556 | 39.11 | 668.0 | 1.99
20-29 | 19133 | 58.62 | 250.0 | 2.5
Upvotes: 0
Reputation: 1
I'm here because i have similar question but i find the short answers wrong and the one with the continuous "case when" is to much work and seeing anything repetitive in my code hurts my eyes. So here is the solution
SELECT --MIN(score), MAX(score),
[score range] = CAST(ROUND(score-5,-1)AS VARCHAR) + ' - ' + CAST((ROUND(score-5,-1)+10)AS VARCHAR),
[number of occurrences] = COUNT(*)
FROM order
GROUP BY CAST(ROUND(score-5,-1)AS VARCHAR) + ' - ' + CAST((ROUND(score-5,-1)+10)AS VARCHAR)
ORDER BY MIN(score)
Upvotes: 0
Reputation: 21
select t.range as score, count(*) as Count
from (
select UserId,
case when isnull(score ,0) >= 0 and isnull(score ,0)< 5 then '0-5'
when isnull(score ,0) >= 5 and isnull(score ,0)< 10 then '5-10'
when isnull(score ,0) >= 10 and isnull(score ,0)< 15 then '10-15'
when isnull(score ,0) >= 15 and isnull(score ,0)< 20 then '15-20'
else ' 20+' end as range
,case when isnull(score ,0) >= 0 and isnull(score ,0)< 5 then 1
when isnull(score ,0) >= 5 and isnull(score ,0)< 10 then 2
when isnull(score ,0) >= 10 and isnull(score ,0)< 15 then 3
when isnull(score ,0) >= 15 and isnull(score ,0)< 20 then 4
else 5 end as pd
from score table
) t
group by t.range,pd order by pd
Upvotes: 0
Reputation: 54620
Neither of the highest voted answers are correct on SQL Server 2000. Perhaps they were using a different version.
Here are the correct versions of both of them on SQL Server 2000.
select t.range as [score range], count(*) as [number of occurences]
from (
select case
when score between 0 and 9 then ' 0- 9'
when score between 10 and 19 then '10-19'
else '20-99' end as range
from scores) t
group by t.range
or
select t.range as [score range], count(*) as [number of occurrences]
from (
select user_id,
case when score >= 0 and score< 10 then '0-9'
when score >= 10 and score< 20 then '10-19'
else '20-99' end as range
from scores) t
group by t.range
Upvotes: 181
Reputation: 11
Try
SELECT (str(range) + "-" + str(range + 9) ) AS [Score range], COUNT(score) AS [number of occurances]
FROM (SELECT score, int(score / 10 ) * 10 AS range FROM scoredata )
GROUP BY range;
Upvotes: 1
Reputation: 1867
This will allow you to not have to specify ranges, and should be SQL server agnostic. Math FTW!
SELECT CONCAT(range,'-',range+9), COUNT(range)
FROM (
SELECT
score - (score % 10) as range
FROM scores
)
Upvotes: 6
Reputation: 2097
I would do this a little differently so that it scales without having to define every case:
select t.range as [score range], count(*) as [number of occurences]
from (
select FLOOR(score/10) as range
from scores) t
group by t.range
Not tested, but you get the idea...
Upvotes: 4
Reputation: 1781
Because the column being sorted on (Range
) is a string, string/word sorting is used instead of numeric sorting.
As long as the strings have zeros to pad out the number lengths the sorting should still be semantically correct:
SELECT t.range AS ScoreRange,
COUNT(*) AS NumberOfOccurrences
FROM (SELECT CASE
WHEN score BETWEEN 0 AND 9 THEN '00-09'
WHEN score BETWEEN 10 AND 19 THEN '10-19'
ELSE '20-99'
END AS Range
FROM Scores) t
GROUP BY t.Range
If the range is mixed, simply pad an extra zero:
SELECT t.range AS ScoreRange,
COUNT(*) AS NumberOfOccurrences
FROM (SELECT CASE
WHEN score BETWEEN 0 AND 9 THEN '000-009'
WHEN score BETWEEN 10 AND 19 THEN '010-019'
WHEN score BETWEEN 20 AND 99 THEN '020-099'
ELSE '100-999'
END AS Range
FROM Scores) t
GROUP BY t.Range
Upvotes: 1
Reputation: 11
select t.blah as [score range], count(*) as [number of occurences]
from (
select case
when score between 0 and 9 then ' 0-9 '
when score between 10 and 19 then '10-19'
when score between 20 and 29 then '20-29'
...
else '90-99' end as blah
from scores) t
group by t.blah
Make sure you use a word other than 'range' if you are in MySQL, or you will get an error for running the above example.
Upvotes: 1
Reputation: 5765
I see answers here that won't work in SQL Server's syntax. I would use:
select t.range as [score range], count(*) as [number of occurences]
from (
select case
when score between 0 and 9 then ' 0-9 '
when score between 10 and 19 then '10-19'
when score between 20 and 29 then '20-29'
...
else '90-99' end as range
from scores) t
group by t.range
EDIT: see comments
Upvotes: 33
Reputation: 18940
An alternative approach would involve storing the ranges in a table, instead of embedding them in the query. You would end up with a table, call it Ranges, that looks like this:
LowerLimit UpperLimit Range
0 9 '0-9'
10 19 '10-19'
20 29 '20-29'
30 39 '30-39'
And a query that looks like this:
Select
Range as [Score Range],
Count(*) as [Number of Occurences]
from
Ranges r inner join Scores s on s.Score between r.LowerLimit and r.UpperLimit
group by Range
This does mean setting up a table, but it would be easy to maintain when the desired ranges change. No code changes necessary!
Upvotes: 45
Reputation: 12821
declare @RangeWidth int
set @RangeWidth = 10
select
Floor(Score/@RangeWidth) as LowerBound,
Floor(Score/@RangeWidth)+@RangeWidth as UpperBound,
Count(*)
From
ScoreTable
group by
Floor(Score/@RangeWidth)
Upvotes: 2
Reputation: 16874
James Curran's answer was the most concise in my opinion, but the output wasn't correct. For SQL Server the simplest statement is as follows:
SELECT
[score range] = CAST((Score/10)*10 AS VARCHAR) + ' - ' + CAST((Score/10)*10+9 AS VARCHAR),
[number of occurrences] = COUNT(*)
FROM #Scores
GROUP BY Score/10
ORDER BY Score/10
This assumes a #Scores temporary table I used to test it, I just populated 100 rows with random number between 0 and 99.
Upvotes: 13
Reputation: 4665
Perhaps you're asking about keeping such things going...
Of course you'll invoke a full table scan for the queries and if the table containing the scores that need to be tallied (aggregations) is large you might want a better performing solution, you can create a secondary table and use rules, such as on insert
- you might look into it.
Not all RDBMS engines have rules, though!
Upvotes: -2
Reputation: 103485
select cast(score/10 as varchar) + '-' + cast(score/10+9 as varchar),
count(*)
from scores
group by score/10
Upvotes: 6
Reputation: 532435
create table scores (
user_id int,
score int
)
select t.range as [score range], count(*) as [number of occurences]
from (
select user_id,
case when score >= 0 and score < 10 then '0-9'
case when score >= 10 and score < 20 then '10-19'
...
else '90-99' as range
from scores) t
group by t.range
Upvotes: 7