DarioB
DarioB

Reputation: 1609

bigquery group by all columns except a few

I have a table with loads of fields, and I am trying to group by all except two values which I am summing on. I would like to do something like

SELECT my_table.* except(value_1, value_2)
    , sum(value_1)
    , sum(value_2)
FROM my_table
GROUP BY my_table.* except(value_1, value_2)

But unfortunately GROUP BY my_table.* except(value_1, value_2) do not work. Any suggestions please?

Upvotes: 10

Views: 15225

Answers (2)

Jeroen
Jeroen

Reputation: 49

BigQuery has since released the GROUP BY ALL clause (currently in preview):
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#group_by_all

This allows you to rewrite your original suggestion like so:

SELECT
    my_table.* EXCEPT(value_1, value_2),
    SUM(value_1),
    SUM(value_2),
FROM my_table
GROUP BY ALL

(BigQuery has also supported trailing commas for years now so I rewrote your SELECT slightly)

The clause automatically excludes expressions that include aggregate functions so this works perfectly for your use case.

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

#standardSQL
SELECT DISTINCT * EXCEPT(value_1, value_2, grp),
  SUM(value_1) OVER(PARTITION BY grp) sum_value_1,
  SUM(value_2) OVER(PARTITION BY grp) sum_value_2
FROM (
  SELECT *, REGEXP_REPLACE(TO_JSON_STRING(t), r'"(?:value_1|value_2)":.+?[,}]', '') grp
  FROM `project.dataset.table` t
)

You can test, play with above using dummy data as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 value_1, 2 value_2, 3 value_3, 4 value_4 UNION ALL
  SELECT 11, 12, 3, 14 UNION ALL
  SELECT 21, 22, 3, 14
)
SELECT DISTINCT * EXCEPT(value_1, value_2, grp),
  SUM(value_1) OVER(PARTITION BY grp) sum_value_1,
  SUM(value_2) OVER(PARTITION BY grp) sum_value_2
FROM (
  SELECT *, REGEXP_REPLACE(TO_JSON_STRING(t), r'"(?:value_1|value_2)":.+?[,}]', '') grp
  FROM `project.dataset.table` t
)

with result as

Row value_3 value_4 sum_value_1 sum_value_2  
1   3       14      32          34   
2   3       4       1           2    

Above will work with any number of columns and you don't need to reference them all explicitly - only those columns to be excluded to be explicitly referenced - value_1 and value_2 in this example

Upvotes: 12

Related Questions