Reputation: 97
I have my schema similar to the below
|- name
+- cars
| |- tesla integer
| |- ferrari integer
Let's say every record specifies an order for cars and every order specifies the number of different cars that were ordered.
Now I want to query the table such that I get the sum of all the different car types.
So I want something like SELECT SUM(cars.*) from table_name
, because I don't know what all possible nested fields the cars might have.
The schema gets dynamically generated every time a new car model is added, and thus in my query I can't possibly specify all the field names because I won't be having them.
example data -
name | cars.tesla | cars.ferrari
vendor1 | 12 | 10
vendor1 | 5 | 5
vendor2 | 4 | 3
desired output for vendor1 -
name | total_tesla | total_ferrari
vendor1 | 17 | 15
So, I want to select sum of all the fields that are nested under the particular record. Is there a way I can do that?
Upvotes: 1
Views: 1788
Reputation: 173046
Below is for BigQuery Standard SQL
#standardSQL
SELECT name, SUM(SAFE_CAST(SPLIT(kv, ':')[OFFSET(1)] AS INT64)) total
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(JSON_EXTRACT(TO_JSON_STRING(t), '$.cars'), r'^{|}$', ''))) kv
GROUP BY name
You can test / play with above using dummy data as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' name, STRUCT<tesla INT64, ferrari INT64>(1, 2) cars UNION ALL
SELECT 'b', STRUCT(3,4)
)
SELECT name, SUM(SAFE_CAST(SPLIT(kv, ':')[OFFSET(1)] AS INT64)) total
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(JSON_EXTRACT(TO_JSON_STRING(t), '$.cars'), r'^{|}$', ''))) kv
GROUP BY name
result is
name total
a 3
b 7
Below is simplified / refactored version of above:
#standardSQL
SELECT name,
(
SELECT SUM(SAFE_CAST(SPLIT(kv, ':')[OFFSET(1)] AS INT64))
FROM UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(cars), r'^{|}$', ''))) kv
) total
FROM `project.dataset.table`
And finally "final" version:
#standardSQL
CREATE TEMP FUNCTION SUM_NESTED(root STRING) AS (
(SELECT SUM(SAFE_CAST(SPLIT(kv, ':')[OFFSET(1)] AS INT64))
FROM UNNEST(SPLIT(REGEXP_REPLACE(root, r'^{|}$', ''))) kv)
);
SELECT name, SUM_NESTED(TO_JSON_STRING(cars)) total
FROM `project.dataset.table`
Update to address edited question
Below should give you direction - it gives you flatten result (you should search this site then for how to pivot it - there are plenty of questions/answers on that topic here
)
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'vendor1' name, STRUCT<tesla INT64, ferrari INT64>(12, 10) cars UNION ALL
SELECT 'vendor1', STRUCT(5, 5) UNION ALL
SELECT 'vendor2', STRUCT(4,3) UNION ALL
SELECT 'vendor2', STRUCT(1,NULL)
)
SELECT
name,
REPLACE(SPLIT(kv, ':')[OFFSET(0)], '"', '') car,
SUM(SAFE_CAST(SPLIT(kv, ':')[OFFSET(1)] AS INT64)) total
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(cars), r'^{|}$', ''))) kv
GROUP BY name, car
-- ORDER BY name, car
result is
name car total
vendor1 ferrari 15
vendor1 tesla 17
vendor2 ferrari 3
vendor2 tesla 5
Upvotes: 3