Reputation: 227
I have products sold at different prices. I want to see how many products were sold in a particular price range. For this, I need to go through the data and see how to divide the ranges and then get the count of products in that range.
Data looks like below-
Product Price sold
A 4.5
B 45.7
C 20
D 20.1
E 36.8
F 50
For example, for the above data I see that the min is 4.5 and max is 50. So, I decide to divide the price range like- 0-10$, 11-20$, 21-30$, 30-40$, 40-50$
So, the result should look like-
Range No. of products sold
0-10 1
11-20 2
21-30 0
30-40 1
40-50 2
The prices are in float so, the ranges should take care of the float values. Is this possible?
Upvotes: 1
Views: 1306
Reputation: 59225
None of the current answers seem to answer the question: "How do I generate the range" (as both answers assume a range 0-50).
What you seem to want is an histogram, and you can find that answer here:
Now, if you want round steps between each bucket:
WITH data AS (
SELECT * FROM `fh-bigquery.public_dump.gdp_capita`
), min_and_max AS (
SELECT MIN(gdp_capita) min, MAX(gdp_capita) max
FROM data
), generate_buckets AS (
SELECT x bucket_min
, IFNULL(LEAD(x) OVER(ORDER BY x), 1+(SELECT max FROM min_and_max)) bucket_max
FROM UNNEST(generate_array(
(SELECT 0 FROM min_and_max) # min or 0, depending on your start
, (SELECT max FROM min_and_max)
, (SELECT POW(10, fhoffa.x.int(LOG10(max-min)))/10 FROM min_and_max) # log10 for round order of 10 steps
)) x
)
SELECT *
FROM generate_buckets
With those buckets, you can get an histogram now:
SELECT bucket_min, bucket_max, COUNT(*) c
FROM generate_buckets
JOIN data
ON data.gdp_capita >= bucket_min AND data.gdp_capita < bucket_max
GROUP BY 1,2
ORDER BY 1
If you also need the buckets with 0 elements:
SELECT * REPLACE(IFNULL(c,0) AS c)
FROM (
SELECT bucket_min, bucket_max, COUNT(*) c
FROM generate_buckets
JOIN data
ON data.gdp_capita >= bucket_min AND data.gdp_capita < bucket_max AND data.one=generate_buckets.one
GROUP BY 1,2
ORDER BY 1
)
RIGHT JOIN generate_buckets USING(bucket_min, bucket_max)
Upvotes: 2
Reputation: 173046
Below is for BigQuery Standard SQL
#standardSQL
WITH price_ranges AS (
SELECT '0-10' price_range UNION ALL
SELECT '11-20' UNION ALL
SELECT '21-30' UNION ALL
SELECT '30-40' UNION ALL
SELECT '40-50'
)
SELECT price_range, COUNT(1) number_sold
FROM `project.dataset.table`
JOIN price_ranges
ON CAST(price_sold AS INT64)
BETWEEN CAST(SPLIT(price_range, '-')[OFFSET(0)] AS INT64)
AND CAST(SPLIT(price_range, '-')[OFFSET(1)] AS INT64)
GROUP BY price_range
-- ORDER BY price_range
If to apply to sample data from your question - result is
Row price_range number_sold
1 0-10 1
2 11-20 2
3 30-40 1
4 40-50 2
Upvotes: 1
Reputation: 1270431
You can use generate_array()
. I would phrase this as:
select lb, lb + 10 as ub, count(d.product)
from unnest(generate_array(0, 50, 10)) lb left join
data d
on d.price >= lb and
d.price < lb + 10
group by lb
order by lb;
You can concatenate the lower bound and upper bound together, but it seems just a useful to keep them in two columns.
Upvotes: 1