Mikko
Mikko

Reputation: 602

Get frequency distribution of a decimal range in MySQL

I'm looking for an elegant way (in terms of syntax, not necessarily efficient) to get the frequency distribution of a decimal range.

For example, I have a table with ratings column which can be a negative or positive. I want to get the frequency of rows with a rating of certain range. - ... - [-140.00 to -130.00): 5 - [-130.00 to -120.00): 2 - [-120.00 to -110.00): 1 - ... - [120.00 to 130.00): 17 - and so on.

[i to j) means i inclusive to j exclusive.

Thanks in advance.

Upvotes: 2

Views: 2348

Answers (4)

wildplasser
wildplasser

Reputation: 44250

Just put your desired ranges into a table, and use that to discriminate the values.

-- SET search_path='tmp';

DROP TABLE measurements;

CREATE TABLE measurements
        ( zval INTEGER NOT NULL PRIMARY KEY
        );
INSERT INTO measurements (zval)
        SELECT generate_series(1,1000);
DELETE FROM measurements WHERE random() < 0.20 ;

DROP TABLE ranges;
CREATE TABLE ranges
        ( zmin INTEGER NOT NULL PRIMARY KEY
        , zmax INTEGER NOT NULL
        );
INSERT INTO ranges(zmin,zmax) VALUES
(0, 100), (100, 200), (200, 300),  (300, 400), (400, 500),
(500, 600), (600, 700), (700, 800),  (800, 900), (900, 1000)
        ;

SELECT ra.zmin,ra.zmax
        , COUNT(*) AS zcount
FROM ranges ra
JOIN measurements me
  ON me.zval >= ra.zmin AND me.zval < ra.zmax
GROUP BY ra.zmin,ra.zmax
ORDER BY ra.zmin
        ;

Results:

 zmin | zmax | zcount 
------+------+--------
    0 |  100 |     89
  100 |  200 |     76
  200 |  300 |     76
  300 |  400 |     74
  400 |  500 |     86
  500 |  600 |     78
  600 |  700 |     75
  700 |  800 |     75
  800 |  900 |     80
  900 | 1000 |     82
(10 rows)

Upvotes: 0

ethrbunny
ethrbunny

Reputation: 10469

You could get pretty close using 'select floor(rating / 10), count(*) from (table) group by 1'

Upvotes: 12

Michael Durrant
Michael Durrant

Reputation: 96544

I was thinking of seomthing that could do many levels like

DELIMITER $$  
CREATE PROCEDURE populate_stats()

   BEGIN
      DECLARE range_loop INT Default 500 ;
      simple_loop: LOOP
         SET the_next = range_loop - 10;
         Select sum(case when range between range_loop and the_next then 1 else 0 end) from table,
         IF the_next=-500 THEN
            LEAVE simple_loop;
         END IF;
   END LOOP simple_loop;
END $$



usage: call populate_stats();

Would handle 100 ranges from 500-490, 490-480, ... -480 - -490, -490 - -500

Upvotes: 1

xQbert
xQbert

Reputation: 35333

assuming a finite number of ranges.

Select 
sum(case when val between -140 to -130 then 1 else 0 end) as sum-140_to_-130,
sum(Case when val between -130 to -120 then 1 else 0 end) as sum-130_to_-140,
...

FROM table

and if not, you could use dynamic SQL to generate the select allowing a number of ranges however you may run into a column limitation.

Upvotes: 0

Related Questions