Rendell1
Rendell1

Reputation: 11

grouping function for Group by rollup in bigquery standard

I know that bigquery legacy SQL has a grouping() function for group by rollup to distinguish null values.

https://cloud.google.com/bigquery/docs/reference/legacy-sql#example_7

Is there a similar function for standard SQL? Or is there any other way to distinguish null values without first replacing them with hardcoded values?

Upvotes: 0

Views: 3008

Answers (1)

Alexandre Moraes
Alexandre Moraes

Reputation: 4051

According to the documentation, GROUP BY ROLLUP() is available in StandardSQL. It is defined as:

GROUP BY ROLLUP returns the results of GROUP BY for prefixes of the expressions in the ROLLUP list, each of which is known as a grouping set. For the ROLLUP list (a, b, c), the grouping sets are (a, b, c), (a, b), (a), ()

You can use it with the syntax as follows:

SELECT * FROM `project_id.dataset.table`
GROUP BY ROLLUP (column)

Additionally, in order to explain the functionality clearly, I will use the example(taken from the documentation) below:

WITH Sales AS (
  SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
  SELECT 123, 1, 8.99 UNION ALL
  SELECT 456, 1, 4.56 UNION ALL
  SELECT 123, 2, 9.99 UNION ALL
  SELECT 789, 3, 1.00 UNION ALL
  SELECT 456, 3, 4.25 UNION ALL
  SELECT 789, 3, 0.99
)
SELECT
  sku,
  day,
  SUM(price) AS total
FROM Sales
GROUP BY ROLLUP(sku, day)
ORDER BY sku, day;

And the output,

+------+------+-------+
| sku  | day  | total |
+------+------+-------+
| NULL | NULL | 39.77 |
|  123 | NULL | 28.97 |
|  123 |    1 | 18.98 |
|  123 |    2 |  9.99 |
|  456 | NULL |  8.81 |
|  456 |    1 |  4.56 |
|  456 |    3 |  4.25 |
|  789 |    3 |  1.99 |
|  789 | NULL |  1.99 |
+------+------+-------+

The output can be explained in 3 parts,

  1. When the sku is null and day is null

It refers to the sum for all sku and day values. In this example, the sum for all the prices across all days and across sku is 39.77

  1. When sku is not null and day is null

It refers to the sum across all the days for an specific sku. For instance, when sku = 123 the sum for all the days is 28.97.

  1. When neither day or sku are null.

It works just as a normal sum and group by. It shows the sum for a specific value pair.

Regarding your second question, if you want to change the null values, you can change each null value accordingly or use another approach. Below I used the WITH to output the same results in a different manner,

WITH Sales AS (
      SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
      SELECT 123, 1, 8.99 UNION ALL
      SELECT 456, 1, 4.56 UNION ALL
      SELECT 123, 2, 9.99 UNION ALL
      SELECT 789, 3, 1.00 UNION ALL
      SELECT 456, 3, 4.25 UNION ALL
      SELECT 789, 3, 0.99
),
sum_all AS (
SELECT sum(price) as sum_all_days_sku from Sales
),
sum_per_sku AS (
SELECT sku, ROUND(sum(price),2) AS sum_per_sku, CONCAT("This sum refers to sku =", sku) AS comment FROM Sales GROUP BY sku
),
sum AS (
SELECT sku, day, sum(price) AS sum_per_sku_per_day FROM Sales GROUP BY sku, day
)
SELECT a.sku,a.day,a.sum_per_sku_per_day, b.sum_per_sku, c.sum_all_days_sku 
FROM sum a LEFT JOIN sum_per_sku b USING(sku) CROSS JOIN sum_all c

And the output,

Row sku day sum_per_sku_per_day sum_per_sku sum_all_days_sku
1   123 1   18.98               28.97       39.77
2   456 1   4.56                8.81        39.77
3   123 2   9.99                28.97       39.77
4   789 3   1.99                1.99        39.77
5   456 3   4.25                8.81        39.77

As it is shown, the output has the same sums as the GROUP BY ROLLUP(). However, it is displayed in columns.

UPDATE:

In case some of the sku rows are null, it will be a little bit confusing if your do not treat the null values before using ROLLUP().

Using the sample data and query as follows,

WITH Sales AS (
  SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
  SELECT 123, 1, 8.99 UNION ALL
  SELECT 456, 1, 4.56 UNION ALL
  SELECT 123, 2, 9.99 UNION ALL
  SELECT 789, 3, 1.00 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1
)
SELECT
  sku,
  day,
  SUM(price) AS total
FROM Sales
GROUP BY ROLLUP(sku, day)
ORDER BY sku, day;

Notice that, there are 5 rows with sku as NULL and the referent sum is 5. And the output,

Row sku  day  total
1   null null 39.53
2   null null 5
3   null 3    5
4   123  null 28.97
5   123  1    18.98
6   123  2    9.99
7   456  null 4.56
8   456  1    4.56
9   789  null 1
10  789  3    1
 

As it is shown above, the second and third output's row are related to the null values not with the output of ROLLUP(). In order to differentiate it, it is necessary to treat these values before using them in the final query. Particularly, I assigned zero where sku is null. Then queried the data, as below:

WITH Sales AS (
  SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
  SELECT 123, 1, 8.99 UNION ALL
  SELECT 456, 1, 4.56 UNION ALL
  SELECT 123, 2, 9.99 UNION ALL
  SELECT 789, 3, 1.00 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1
), 
data as (
SELECT IFNULL(sku , 0) as sku, day, price from Sales
)
SELECT
  sku,
  day,
  SUM(price) AS total
FROM data
GROUP BY ROLLUP(sku, day)
ORDER BY sku, day;

And the output,

Row sku  day  total
1   null null 39.53
2   0    null 5
3   0    3    5
4   123  null 28.97
5   123  1    18.98
6   123  2    9.99
7   456  null 4.56
8   456  1    4.56
9   789  null 1
10  789  3    1

Notice that, now it is easier to understand since it we know that 0 represented null data.

Upvotes: 1

Related Questions