Reputation: 33

Group by range of values in bigquery

Is there any way in Bigquery to group by not the absolute value but a range of values?

I have a query that looks in a product table with 4 different numeric group by's. What I am looking for is an efficient way to group by in a way like: group by "A±1000" etc. or "A±10%ofA".

thanks in advance,

Upvotes: 1

Answers (3)

Mikhail Berlyant

Reputation: 173046

Below example is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.example` AS (
  SELECT * FROM 
  UNNEST([STRUCT<id INT64, price FLOAT64>
    (1, 15), (2, 50), (3, 125), (4, 150), (5, 175), (6, 250)
  ])
)
SELECT 
  CASE
    WHEN price > 0   AND price <= 100 THEN '  0 - 100'
    WHEN price > 100 AND price <= 200 THEN '100 - 200'
    ELSE '200+'
  END AS range_group, 
  COUNT(1) AS cnt 
FROM `project.dataset.example`
GROUP BY range_group
-- ORDER BY range_group

with result

Row range_group cnt  
1   0 - 100     2    
2   100 - 200   3    
3   200+        1

As you can see, in above solution you need construct CASE statement to reflect your ranges - if you have multiple - this can be quite boring - so below is more generic (but more verbose) solution - and it uses recently introduced RANGE_BUCKET function

#standardSQL
WITH `project.dataset.example` AS (
  SELECT * FROM 
  UNNEST([STRUCT<id INT64, price FLOAT64>
    (1, 15), (2, 50), (3, 125), (4, 150), (5, 175), (6, 250)
  ])
), ranges AS (
  SELECT [100.0, 200.0] ranges_array
), temp AS (
  SELECT OFFSET, IF(prev_val = val, CONCAT(prev_val, ' - '), CONCAT(prev_val, ' - ', val)) rng FROM (
    SELECT OFFSET, IFNULL(CAST(LAG(val) OVER(ORDER BY OFFSET) AS STRING), '') prev_val, CAST(val AS STRING) AS val
    FROM ranges, UNNEST(ARRAY_CONCAT(ranges_array, [ARRAY_REVERSE(ranges_array)[OFFSET(0)]])) val WITH OFFSET
  )
)
SELECT 
  RANGE_BUCKET(price, ranges_array) range_group, 
  rng, 
  COUNT(1) AS cnt 
FROM `project.dataset.example`, ranges
JOIN temp ON RANGE_BUCKET(price, ranges_array) = OFFSET
GROUP BY range_group, rng
-- ORDER BY range_group

with result

Row range_group rng         cnt  
1   0               - 100   2    
2   1           100 - 200   3    
3   2           200 -       1

As you can see, in second solution you need to define your your ranges in ranges as simple array enlisting your boundaries as SELECT [100.0, 200.0] ranges_array
Then temp does all needed calculation

Upvotes: 3

Felipe Hoffa

Reputation: 59225

You can do math operations on the GROUP BY, creating groups by any arbitrary criteria.

For example:

WITH data AS (
  SELECT repo.name, COUNT(*) price
  FROM `githubarchive.month.201909` 
  GROUP BY 1
  HAVING price>100
)


SELECT FORMAT('range %i-%i', MIN(price), MAX(price)) price_range,  COUNT(*) c
FROM data
GROUP BY CAST(LOG(price) AS INT64)
ORDER BY MIN(price)

Upvotes: 0

Yun Zhang

Reputation: 5518

You can generate a column as a "named range" then group by the column. As an example for your A+-1000 case:

with data as ( 
select 100 as v union all
select 200 union all
select 2000 union all
select 2100 union all
select 2200 union all
select 4100 union all
select 8000 union all
select 8000
)
select count(v), ARRAY_AGG(v), ranges
FROM data, unnest([0, 2000, 4000, 6000, 8000]) ranges
WHERE data.v >= ranges - 1000 AND data.v < ranges + 1000
GROUP BY ranges

Output:

+-----+------------------------+--------+
| f0_ |          f1_           | ranges |
+-----+------------------------+--------+
|   2 |          ["100","200"] |      0 |
|   3 | ["2000","2100","2200"] |   2000 |
|   1 |               ["4100"] |   4000 |
|   2 |        ["8000","8000"] |   8000 |
+-----+------------------------+--------+

Upvotes: 5

Group by range of values in bigquery

Answers (3)

Related Questions