Reputation: 187379
In a MySQL database there's a table with a single numeric value
column. I want to plot the distribution of these values as a bar chart. The width of each bar in the chart should be uniform and the height of each bar should reflect the number of entries in this range.
For example, if the range of values is from -10..90
and the number of ranges (bars) is 10, then the first bar in the chart should show the number of entries in the range -10..0
and the last should show the number of entries in the range 80..90
.
Is there a better way to calculate the bar chart data than the following:
select min(value), max(value) from mytable
select count(*) from mytable where value between X and Y
A problem with this simplistic approach is that a separate query will need to be issued to find the number of values in each range, is it possible to make this more efficient, e.g. by building up the query dynamically and using a GROUP BY
?
Upvotes: 1
Views: 2992
Reputation: 33945
SELECT * FROM my_table;
+----+-----+
| id | val |
+----+-----+
| 1 | 19 |
| 2 | 10 |
| 3 | 6 |
| 4 | 29 |
| 6 | 27 |
| 7 | 20 |
| 8 | 11 |
| 9 | 12 |
| 13 | 16 |
| 14 | 38 |
| 15 | 8 |
| 16 | 22 |
| 17 | 23 |
| 18 | 16 |
| 19 | 20 |
| 20 | 18 |
| 28 | 18 |
| 29 | 7 |
| 30 | 10 |
| 31 | 34 |
| 32 | 11 |
| 33 | 17 |
| 34 | 15 |
| 35 | 12 |
| 36 | 19 |
| 37 | 15 |
| 38 | 18 |
| 39 | 24 |
| 40 | 10 |
| 41 | 19 |
| 42 | 25 |
| 43 | 22 |
| 59 | 27 |
| 60 | 14 |
| 61 | 17 |
| 62 | 25 |
| 63 | 28 |
| 64 | 20 |
| 65 | 10 |
| 66 | 18 |
| 67 | 22 |
| 68 | 12 |
| 69 | 18 |
| 70 | 31 |
| 71 | 22 |
| 72 | 23 |
| 73 | 20 |
| 74 | 12 |
| 75 | 10 |
| 76 | 9 |
| 77 | 21 |
| 78 | 18 |
| 79 | 14 |
| 80 | 17 |
| 81 | 7 |
| 82 | 21 |
| 83 | 11 |
| 84 | 16 |
| 85 | 19 |
| 86 | 30 |
| 87 | 11 |
| 88 | 18 |
| 89 | 26 |
| 90 | 16 |
+----+-----+
SELECT CEILING(((1+val-(SELECT MIN(val) FROM my_table)))/ROUND((SELECT MAX(val)-MIN(val) FROM my_table)/8)) x
, COUNT(*)
, GROUP_CONCAT(LPAD(val,2,0) ORDER BY val) n
FROM my_table
GROUP
BY CEILING(((1+val-(SELECT MIN(val) FROM my_table)))/ROUND((SELECT MAX(val)-MIN(val) FROM my_table)/8));
+------+----------+----------------------------------------------------+
| x | COUNT(*) | n |
+------+----------+----------------------------------------------------+
| 1 | 5 | 06,07,07,08,09 |
| 2 | 13 | 10,10,10,10,10,11,11,11,11,12,12,12,12 |
| 3 | 11 | 14,14,15,15,16,16,16,16,17,17,17 |
| 4 | 17 | 18,18,18,18,18,18,18,19,19,19,19,20,20,20,20,21,21 |
| 5 | 9 | 22,22,22,22,23,23,24,25,25 |
| 6 | 5 | 26,27,27,28,29 |
| 7 | 2 | 30,31 |
| 8 | 1 | 34 |
| 9 | 1 | 38 |
+------+----------+----------------------------------------------------+
Upvotes: 1
Reputation: 93
If I understand your question correctly, the ranges are unknown and you must determine each dynamically. Find the min and max values as you have done. Using a cursor, calculate the upper and lower bounds for each range and save the results to a temporary table containing fields such as rangeLabel, lowerBound, upperBound. Select the temp table and look up the counts with a subquery:
SELECT rangeLabel, (SELECT Count(*) FROM mytable
WHERE value BETWEEN temp.lowerBound AND temp.Upperbound) as myCount
FROM temp
Upvotes: 0
Reputation: 295
SELECT CASE
WHEN (max(numb) - min(numb) BETWEEN xx AND yy THEN zz
WHEN (max(numb) - min(numb) BETWEEN zz AND aa THEN cc
will handle your sizing of your bar chart and a GROUP BY in another query will give you the values in each distribution.
Upvotes: 0