Reputation: 110382
Is there a way to accomplish the following in BigQuery? This syntax is supported in a DB such as Postgres:
SELECT ProductGroup, Product, Year, Month, AVG(Revenue)
FROM Sales
group by rollup(ProductGroup, Product), rollup(Year, Month)
In other words, I want the cross product of the two rollups:
ROLLUP(ProductGroup, Product) --> (), (ProductGroup), (ProductGroup, Product)
ROLLUP(Year, Month) --> (), (Year), (Year, Month)
((), (ProductGroup), (ProductGroup, Product)) x ((), (Year), (Year, Month))
= (
(), (ProductGroup), (ProductGroup, Product),
(Year), (Year, ProductGroup), (Year, ProductGroup, Product).
(Year, Month), (Year, Month, ProductGroup), (Year, Month, ProductGroup, Product)
)
When attempting in BQ I get the following error:
The GROUP BY clause only supports ROLLUP when there are no other grouping elements at [2:10]
Here is an update with some sample pictures and data.
First, I want to replicate the functionality of an Excel Pivot table. This is where the cross product of the rollups from ROWS and COLS come into play:
Note that the Pivot table has 63 value cells.
Now, the correct SQL syntax for this is as follows in the verbose GROUP BY
-only syntax:
Notice that this also produces exactly 63 rows (and since we have one value column -- SUM Revenue -- 63 rows x 1 col = 63 value cells). The query is the following:
with sales as (
select 2010 Year, 'Jan' Month, 'Electronics' ProductGroup, 'Phone' Product, 100 Revenue union all
select 2010, 'Jan', 'Electronics', 'Laptop', 200 union all
select 2010, 'Jan', 'Cars', 'Jeep', 250 union all
select 2010, 'Jan', 'Cars', 'Hummer', 105 union all
select 2010, 'Feb', 'Electronics', 'Phone', 110 union all
select 2010, 'Feb', 'Electronics', 'Laptop', 300 union all
select 2010, 'Feb', 'Cars', 'Jeep', 50 union all
select 2010, 'Feb', 'Cars', 'Hummer', 75 union all
select 2010, 'Mar', 'Electronics', 'Phone', 80 union all
select 2010, 'Mar', 'Electronics', 'Laptop', 200 union all
select 2010, 'Mar', 'Cars', 'Jeep', 100 union all
select 2010, 'Mar', 'Cars', 'Hummer', 50 union all
select 2011, 'Jan', 'Electronics', 'Phone', 200 union all
select 2011, 'Jan', 'Electronics', 'Laptop', 300 union all
select 2011, 'Jan', 'Cars', 'Jeep', 100 union all
select 2011, 'Jan', 'Cars', 'Hummer', 200 union all
select 2011, 'Feb', 'Electronics', 'Phone', 300 union all
select 2011, 'Feb', 'Electronics', 'Laptop', 900 union all
select 2011, 'Feb', 'Cars', 'Jeep', 100 union all
select 2011, 'Feb', 'Cars', 'Hummer', 200 union all
select 2011, 'Mar', 'Electronics', 'Phone', 400 union all
select 2011, 'Mar', 'Electronics', 'Laptop', 350 union all
select 2011, 'Mar', 'Cars', 'Jeep', 240 union all
select 2011, 'Mar', 'Cars', 'Hummer', 130
)
-- ROLLUP(ProductGroup, Product), ROLLUP(Year, Month)
--> (), (ProductGroup), (ProductGroup, Product)
--> (Year), (Year, ProductGroup), (Year, ProductGroup, Product)
--> (Year, Month), (Year, Month, ProductGroup), (Year, Month, ProductGroup, Product)
SELECT NULL, NULL, NULL, NULL, AVG(Revenue) FROM Sales UNION ALL -- ()
SELECT ProductGroup, NULL, NULL, NULL, AVG(Revenue) FROM Sales GROUP BY ProductGroup UNION ALL -- (ProductGroup)
SELECT ProductGroup, Product, NULL, NULL, AVG(Revenue) FROM Sales GROUP BY ProductGroup, Product UNION ALL -- (ProductGroup, Product)
SELECT NULL, NULL, Year, NULL, AVG(Revenue) FROM Sales GROUP BY Year UNION ALL -- (Year)
SELECT ProductGroup, NULL, Year, NULL, AVG(Revenue) FROM Sales GROUP BY Year, ProductGroup UNION ALL -- (Year, ProductGroup)
SELECT ProductGroup, Product, Year, NULL, AVG(Revenue) FROM Sales GROUP BY Year, ProductGroup, Product UNION ALL-- (Year, ProductGroup, Product)
SELECT NULL, NULL, Year, Month, AVG(Revenue) FROM Sales GROUP BY Year, Month UNION ALL -- (Year, Month)
SELECT ProductGroup, NULL, Year, Month, AVG(Revenue) FROM Sales GROUP BY ProductGroup, Year, Month UNION ALL -- (ProductGroup, Year, Month)
SELECT ProductGroup, Product, Year, Month, AVG(Revenue) FROM Sales GROUP BY ProductGroup, Product, Year, Month -- (ProductGroup, Product, Year Month)
However, this query is really a nightmare to product -- even if generated programatically -- as there may be an order by
, subselect
, ... etc and union-ing all those statements together could potentially turn into a monstrous construction (for example, a 3 rows x 3 cols construction with a 100-line SQL statement would become 4^2 * 100 lines of sql, and 5x5 would be 5^2 * 100 lines, etc. if my math is correct).
What would be the proper way to do this then? Note that in a database like Postgres the following works as-is:
SELECT ProductGroup, Product, Year, Month, AVG(Revenue) FROM Sales GROUP BY ROLLUP(ProductGroup, Product), ROLLUP(Year, Month);
Here is the Saved Query if you want to use this as a starting point: https://console.cloud.google.com/bigquery?sq=260144861653:552549d2a81a47b59df6e3d16ef9bf17.
Finally, if you think it would be a useful feature to add in the GROUPING SETS
and CUBE
, please upvote this feature request: https://issuetracker.google.com/issues/204913323.
Upvotes: 6
Views: 1311
Reputation: 86765
Back to my original answer of grouping twice...
WITH
rollup_pg_p AS
(
SELECT
ProductGroup, Product, 1 AS dummy, Year, Month, SUM(Revenue) AS sum_rev, COUNT(Revenue) AS cnt_row
FROM
Sales
GROUP BY
ROLLUP(Year, Month, ProductGroup, Product)
HAVING
Month IS NOT NULL -- This prevents the roll up going further than desired
-- Effectively giving `GROUP BY Year, Month, ROLLUP(ProductGroup, Product)
)
SELECT
ProductGroup, Product, Year, Month, SUM(sum_rev) / SUM(cnt_row)
FROM
rollup_pg_p
GROUP BY
ROLLUP(ProductGroup, Product, Dummy, Year, Month)
HAVING
dummy IS NOT NULL -- Same 'trick' again, but we created the dummy column
-- as ProductGroup and Product CAN legitimately be NULL at this point.
ORDER BY
1, 2, 3, 4
(NOTE: Other dialects would use WHERE NOT GROUPING(Product)
, so avoid the need for the dummy column, but BigQuery doesn't appear to have that functionality either...)
Still has the downside of not working for some aggregates, but might be meaningfully faster than the alternative approaches.
Upvotes: 1
Reputation: 86765
Similar to my previous answer, but without CASE
expressions 'blocking' use of indexes.
It still processes 9x as much data though, but probably more quickly than the CASE
based approach.
WITH
rollup_pg_p AS
(
SELECT ProductGroup, Product, Year, Month, Revenue FROM Sales
UNION ALL SELECT ProductGroup, NULL, Year, Month, Revenue FROM Sales
UNION ALL SELECT NULL, NULL, Year, Month, Revenue FROM Sales
),
rollup_y_m AS
(
SELECT ProductGroup, Product, Year, Month, Revenue FROM rollup_pg_p
UNION ALL SELECT ProductGroup, Product, Year, NULL, Revenue FROM rollup_pg_p
UNION ALL SELECT ProductGroup, Product, NULL, NULL, Revenue FROM rollup_pg_p
)
SELECT
ProductGroup, Product, Year, Month, AVG(Revenue) FROM rollup_y_m
GROUP BY
1, 2, 3, 4
ORDER BY
1, 2, 3, 4
Edit: An elaboration.
Your query is this (-
is my shorthand for NULL
in pseudocode)...
SELECT a, b, c, d, AVG(x) FROM src GROUP BY a, b, c, d
UNION ALL SELECT a, b, c, -, AVG(x) FROM src GROUP BY a, b, c
UNION ALL SELECT a, b, -, -, AVG(x) FROM src GROUP BY a, b
UNION ALL SELECT a, -, c, d, AVG(x) FROM src GROUP BY a, c, d
UNION ALL SELECT a, -, c, -, AVG(x) FROM src GROUP BY a, c
UNION ALL SELECT a, -, -, -, AVG(x) FROM src GROUP BY a
UNION ALL SELECT -, -, c, d, AVG(x) FROM src GROUP BY c, d
UNION ALL SELECT -, -, c, -, AVG(x) FROM src GROUP BY c
UNION ALL SELECT -, -, -, -, AVG(x) FROM src
Which is functionally the same as this...
WITH
combinations AS (
SELECT a, b, c, d, x FROM src
UNION ALL SELECT a, b, c, -, x FROM src
UNION ALL SELECT a, b, -, -, x FROM src
UNION ALL SELECT a, -, c, d, x FROM src
UNION ALL SELECT a, -, c, -, x FROM src
UNION ALL SELECT a, -, -, -, x FROM src
UNION ALL SELECT -, -, c, d, x FROM src
UNION ALL SELECT -, -, c, -, x FROM src
UNION ALL SELECT -, -, -, -, x FROM src
)
SELECT a, b, c, d, AVG(x) FROM combinations GROUP BY a, b, c, d
The advantage of the latter is that the aggregate (or aggregates) that you want to apply are only written once, as is the GROUP BY
.
This still requires enumerating all 9 combinations.
So, the answer at the start is just a short hand way of enumerating the 9 combinations. Not much shorter, but slightly. And would be even more valuable if you needed ROLLUP(a, b), ROLLUP(c, d), ROLLUP(e, f)
(Writing 3 combinations for each ROLLUP()
, for 9 in total, to generate 27 combinations.)
Upvotes: 1
Reputation: 86765
EDIT: I misread that GROUP BY ROLLUP(A, B), C, D
was possible, here's an alternative.
You could implement your own GROUPING SETS
logic by cross joining on to a map of which columns you want to rollup or not...
SELECT
CASE WHEN include_pg = 1 THEN ProductGroup END,
CASE WHEN include_p = 1 THEN Product END,
CASE WHEN include_y = 1 THEN Year END,
CASE WHEN include_m = 1 THEN Month END,
AVG(Revenue)
FROM
Sales
CROSS JOIN
(
SELECT 1 AS include_pg, 1 AS include_p
UNION ALL SELECT 1 AS include_pg, 0 AS include_p
UNION ALL SELECT 0 AS include_pg, 0 AS include_p
)
AS rollup_pg_p
CROSS JOIN
(
SELECT 1 AS include_y, 1 AS include_m
UNION ALL SELECT 1 AS include_y, 0 AS include_m
UNION ALL SELECT 0 AS include_y, 0 AS include_m
)
AS rollup_y_m
GROUP BY
1, 2, 3, 4
ORDER BY
1, 2, 3, 4
Upvotes: 2
Reputation: 173046
it gives the correct number of rows but how would I pass along the AVG(Revenue)?
Consider below - looks like a simple pattern to apply to more of your potential cases
select r.Year, r.Month, r.ProductGroup, r.Product,
round(avg(Revenue), 2) avg_Revenue
from (
select * from (
select ProductGroup, Product from Sales group by rollup(ProductGroup, Product)
), (
select Year, Month from Sales group by rollup(Year, Month)
)
) r
join sales s
on if(r.Year is null, true, r.Year = s.Year)
and if(r.Month is null, true, r.Month = s.Month)
and if(r.ProductGroup is null, true, r.ProductGroup = s.ProductGroup)
and if(r.Product is null, true, r.Product = s.Product)
group by r.Year, r.Month, r.ProductGroup, r.Product
if applied to sample data in your script - output is (cut by top rows)
Upvotes: 1
Reputation: 24603
ugly but probably the simplest way to have 3 group by statement and union them :
SELECT ProductGroup,Product,NULL year ,NULL month, AVG(sales.Revenue) avg
FROM sales
GROUP BY ROLLUP(ProductGroup,Product)
UNION DISTINCT
SELECT ProductGroup,Product,Year,NULL month, AVG(sales.Revenue) avg
FROM sales
GROUP BY ROLLUP(Year, ProductGroup, Product)
UNION DISTINCT
SELECT ProductGroup,Product,Year,MONTH, AVG(sales.Revenue) avg
FROM sales
GROUP BY ROLLUP(Year, Month, ProductGroup, Product)
Upvotes: 1
Reputation: 173046
I want the cross product of the two rollups:
Consider below
select * from (
select date, code
from `first-outlet-750.tests.parq_stored`
group by rollup(date, code)
), (
select country, state
from `first-outlet-750.tests.parq_stored`
group by rollup(country, state))
with output like below
Upvotes: 0