synapse
synapse

Reputation: 5728

Group records by range

I need to select a count of items that fall into some range like this

create table numbers (val int);

insert into numbers(val) values (2), (3), (11), (12), (13), (31);

select count(1) as qty , val / 10 as range
from numbers
group by val / 10;

Obviously, if there're no items in range, it won't be included in the output. I could think of some inelegant ways to include all ranges in the output, but is there an elegant and fast one (in PostgreSQL or MS SQL Server dialect)

Upvotes: 2

Views: 2637

Answers (4)

greg
greg

Reputation: 3495

Postgresql 9.2 also implements the range type:

  SELECT 
    range.r, 
    count(val.n) 
  FROM 
    (VALUES (int4range '[0, 10)'), (int4range '[10, 20)'), (int4range '[20, 30)'), (int4range '[30, 40)')) range (r) 
     LEFT JOIN (VALUES (2), (3), (9), (11), (17), (31), (33)) AS val (n) ON  val.n <@   range.r
  GROUP BY
    range.r
  ORDER BY r ASC

┌─────────┬───────┐
│    r    │ count │
├─────────┼───────┤
│ [0,10)  │     3 │
│ [10,20) │     2 │
│ [20,30) │     0 │
│ [30,40) │     2 │
└─────────┴───────┘

Upvotes: 4

Craig Ringer
Craig Ringer

Reputation: 324465

It appears that you want to create a histogram of your results.

PostgreSQL:

select x, count(val)
from generate_series(1,6) x 
left outer join numbers on (x = width_bucket(val, 0, 60, 6))
group by x;

I've used width_bucket instead of a simple division and modulus because it's more general and easier to get right for more complex ranges. Also, it's awesome.

Mark Bannister's recursive CTE for sequence generation can be integrated and JOINed on as x instead of the generate_series for added portability if you want, and the limits can be automatically determined:

with recursive ranges(rangev) as (
    select 0 rangev union all select rangev+1 as rangev from ranges where rangev < 4
), bounds(lower_bucket, upper_bucket) as (
    select (min(val))/10, (max(val)/10)+1 from numbers
)
select
    rangev as bucket,
    rangev*10 AS lower_bound,
    (rangev+1)*10-1 AS upper_bound,
    count(val) AS num_in_bucket
from ranges cross join bounds
left outer join numbers on (rangev = width_bucket(val, lower_bucket, upper_bucket*10, upper_bucket))
group by rangev
order by rangev asc;

If you prefer /10 over width_bucket (say, if width_bucket isn't available in MS SQL) that's easy to change back.

Output:

 bucket | lower_bound | upper_bound | num_in_bucket 
--------+-------------+-------------+---------------
      0 |           0 |           9 |             0
      1 |          10 |          19 |             2
      2 |          20 |          29 |             3
      3 |          30 |          39 |             0
      4 |          40 |          49 |             1
(5 rows)

Upvotes: 5

user359040
user359040

Reputation:

Try:

with ranges as 
(select 0 rangev union all 
 select rangev+1 as rangev 
 from ranges 
 where rangev < 4)
select count(val) as qty , rangev
from ranges 
left join numbers on rangev = val / 10
group by rangev;

(SQLServer version - change with ranges to with recursive ranges for PostgresQL)

Upvotes: 3

podiluska
podiluska

Reputation: 51494

For MS SQL Server

declare @min int, @max int
    select @min = MIN(val/10), @max = MAX(val/10) from numbers

    select ranges.number as range, COUNT(val) as qty from master.dbo.spt_values ranges
        left join numbers on ranges.number = numbers.val/10
    where type='p' and number between @min and @max
    group by number    

The same principle could be applied to Postgres although the generation of the numbers list would be different

Upvotes: 1

Related Questions