Reputation: 11
I know that bigquery legacy SQL has a grouping() function for group by rollup to distinguish null values.
https://cloud.google.com/bigquery/docs/reference/legacy-sql#example_7
Is there a similar function for standard SQL? Or is there any other way to distinguish null values without first replacing them with hardcoded values?
Upvotes: 0
Views: 3008
Reputation: 4051
According to the documentation, GROUP BY ROLLUP() is available in StandardSQL. It is defined as:
GROUP BY ROLLUP returns the results of GROUP BY for prefixes of the expressions in the ROLLUP list, each of which is known as a grouping set. For the ROLLUP list (a, b, c), the grouping sets are (a, b, c), (a, b), (a), ()
You can use it with the syntax as follows:
SELECT * FROM `project_id.dataset.table`
GROUP BY ROLLUP (column)
Additionally, in order to explain the functionality clearly, I will use the example(taken from the documentation) below:
WITH Sales AS (
SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
SELECT 123, 1, 8.99 UNION ALL
SELECT 456, 1, 4.56 UNION ALL
SELECT 123, 2, 9.99 UNION ALL
SELECT 789, 3, 1.00 UNION ALL
SELECT 456, 3, 4.25 UNION ALL
SELECT 789, 3, 0.99
)
SELECT
sku,
day,
SUM(price) AS total
FROM Sales
GROUP BY ROLLUP(sku, day)
ORDER BY sku, day;
And the output,
+------+------+-------+
| sku | day | total |
+------+------+-------+
| NULL | NULL | 39.77 |
| 123 | NULL | 28.97 |
| 123 | 1 | 18.98 |
| 123 | 2 | 9.99 |
| 456 | NULL | 8.81 |
| 456 | 1 | 4.56 |
| 456 | 3 | 4.25 |
| 789 | 3 | 1.99 |
| 789 | NULL | 1.99 |
+------+------+-------+
The output can be explained in 3 parts,
sku
is null and day is nullIt refers to the sum for all sku
and day
values. In this example, the sum for all the prices across all days and across sku is 39.77
sku
is not null and day
is nullIt refers to the sum across all the days
for an specific sku
. For instance, when sku = 123
the sum for all the days is 28.97.
day
or sku
are null.It works just as a normal sum and group by. It shows the sum for a specific value pair.
Regarding your second question, if you want to change the null values, you can change each null value accordingly or use another approach. Below I used the WITH to output the same results in a different manner,
WITH Sales AS (
SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
SELECT 123, 1, 8.99 UNION ALL
SELECT 456, 1, 4.56 UNION ALL
SELECT 123, 2, 9.99 UNION ALL
SELECT 789, 3, 1.00 UNION ALL
SELECT 456, 3, 4.25 UNION ALL
SELECT 789, 3, 0.99
),
sum_all AS (
SELECT sum(price) as sum_all_days_sku from Sales
),
sum_per_sku AS (
SELECT sku, ROUND(sum(price),2) AS sum_per_sku, CONCAT("This sum refers to sku =", sku) AS comment FROM Sales GROUP BY sku
),
sum AS (
SELECT sku, day, sum(price) AS sum_per_sku_per_day FROM Sales GROUP BY sku, day
)
SELECT a.sku,a.day,a.sum_per_sku_per_day, b.sum_per_sku, c.sum_all_days_sku
FROM sum a LEFT JOIN sum_per_sku b USING(sku) CROSS JOIN sum_all c
And the output,
Row sku day sum_per_sku_per_day sum_per_sku sum_all_days_sku
1 123 1 18.98 28.97 39.77
2 456 1 4.56 8.81 39.77
3 123 2 9.99 28.97 39.77
4 789 3 1.99 1.99 39.77
5 456 3 4.25 8.81 39.77
As it is shown, the output has the same sums as the GROUP BY ROLLUP(). However, it is displayed in columns.
UPDATE:
In case some of the sku
rows are null, it will be a little bit confusing if your do not treat the null values before using ROLLUP().
Using the sample data and query as follows,
WITH Sales AS (
SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
SELECT 123, 1, 8.99 UNION ALL
SELECT 456, 1, 4.56 UNION ALL
SELECT 123, 2, 9.99 UNION ALL
SELECT 789, 3, 1.00 UNION ALL
SELECT null, 3, 1 UNION ALL
SELECT null, 3, 1 UNION ALL
SELECT null, 3, 1 UNION ALL
SELECT null, 3, 1 UNION ALL
SELECT null, 3, 1
)
SELECT
sku,
day,
SUM(price) AS total
FROM Sales
GROUP BY ROLLUP(sku, day)
ORDER BY sku, day;
Notice that, there are 5 rows with sku
as NULL
and the referent sum is 5. And the output,
Row sku day total
1 null null 39.53
2 null null 5
3 null 3 5
4 123 null 28.97
5 123 1 18.98
6 123 2 9.99
7 456 null 4.56
8 456 1 4.56
9 789 null 1
10 789 3 1
As it is shown above, the second and third output's row are related to the null values not with the output of ROLLUP()
. In order to differentiate it, it is necessary to treat these values before using them in the final query. Particularly, I assigned zero where sku
is null. Then queried the data, as below:
WITH Sales AS (
SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
SELECT 123, 1, 8.99 UNION ALL
SELECT 456, 1, 4.56 UNION ALL
SELECT 123, 2, 9.99 UNION ALL
SELECT 789, 3, 1.00 UNION ALL
SELECT null, 3, 1 UNION ALL
SELECT null, 3, 1 UNION ALL
SELECT null, 3, 1 UNION ALL
SELECT null, 3, 1 UNION ALL
SELECT null, 3, 1
),
data as (
SELECT IFNULL(sku , 0) as sku, day, price from Sales
)
SELECT
sku,
day,
SUM(price) AS total
FROM data
GROUP BY ROLLUP(sku, day)
ORDER BY sku, day;
And the output,
Row sku day total
1 null null 39.53
2 0 null 5
3 0 3 5
4 123 null 28.97
5 123 1 18.98
6 123 2 9.99
7 456 null 4.56
8 456 1 4.56
9 789 null 1
10 789 3 1
Notice that, now it is easier to understand since it we know that 0
represented null data.
Upvotes: 1