leodaily
leodaily

Reputation: 97

Summing repeated fields in BigQuery

I will try to explain my problem as clearly as possible, please tell me if it is not.

I have a table [MyTable] that looks like this:

----------------------------------------
|chn:integer | auds:integer (repeated) |
----------------------------------------
|1           |3916                     |
|1           |4983                     |
|1           |6233                     |
|1           |1214                     |
|2           |1200                     |
|2           |900                      |
|2           |2030                     |
|2           |2345                     |
----------------------------------------

Auds is always repeated 4 times.

If I query SELECT chn, auds FROM [MyTable] WHERE chn = 1, I get the following result:

-------------------
|Row | chn | auds |
-------------------
|1   |1    |3916  |
|2   |1    |4983  |  
|3   |1    |6233  |  
|4   |1    |1214  |
-------------------

If I query SELECT chn, auds FROM [MyTable] WHERE (chn = 1 OR chn = 2), I get the following result:

-------------------
|Row | chn | auds |
-------------------
|1   |1    |1200  |
|2   |1    |900   |  
|3   |1    |2030  |  
|4   |2    |2345  |
-------------------

Logically, I get twice as much results, but what I would like to get is the SUM() of the repeated field auds for chn = 1 and chn = 2, or visually, something like this:

-------------------
|Row | chn | auds |
-------------------
|1   |3    |5116  |
|2   |3    |5883  |  
|3   |3    |8263  |  
|4   |3    |3559  |
-------------------

I tried to to something:

SELECT a1+a2 FROM

(SELECT auds AS a1 FROM [MyTable] WHERE chn = 1),
(SELECT auds AS a2 FROM [MyTable] WHERE chn = 2)

But I get the following error:

Error: Cannot query the cross product of repeated fields a1 and a2.

Upvotes: 4

Views: 6264

Answers (3)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173046

Elliott’s answers are always inspiration for me! Please vote and accept his answer if it works for you (it should :o))
Meantime, wanted to add alternative option with Scalar JS UDF

CREATE TEMPORARY FUNCTION mySUM(a ARRAY<INT64>, b ARRAY<INT64>)
RETURNS ARRAY<INT64>
LANGUAGE js AS """
  var sum = [];
  for(var i = 0; i < a.length; i++){
     sum.push(parseInt(a[i]) + parseInt(b[i]));
  }  
  return sum
  """;

WITH MyTable  AS (
  SELECT
    1 AS chn,
    [2, 3, 4, 5, 6] AS auds
  UNION ALL SELECT
    2 AS chn,
    [7, 8, 9, 10, 11] AS auds
)
SELECT 
  first_auds.chn AS first_auds_chn, 
  second_auds.chn AS second_auds_chn, 
  mySUM(first_auds.auds, second_auds.auds) AS summed_auds
FROM MyTable AS first_auds
JOIN MyTable AS second_auds
ON first_auds.chn = 1 AND second_auds.chn = 2

I like this option because it less filled with multiple UNNESTs, ARRAYs etc so it is much cleaner to read.

Upvotes: 1

Elliott Brossard
Elliott Brossard

Reputation: 33755

It's much easier to express this sort of logic with standard SQL (uncheck "Use Legacy SQL" under "Show Options"). Here's an example that computes sums over the auds arrays:

WITH MyTable AS (
  SELECT
    1 AS chn,
    [2, 3, 4, 5, 6] AS auds
  UNION ALL SELECT
    2 AS chn,
    [7, 8, 9, 10, 11] AS auds
)
SELECT
  chn,
  (SELECT SUM(aud) FROM UNNEST(auds) AS aud) AS auds_sum
FROM MyTable;
+-----+----------+
| chn | auds_sum |
+-----+----------+
|   1 |       20 |
|   2 |       45 |
+-----+----------+

And another that computes pairwise sums for chn = 1 and chn = 2 (which I think is what you wanted based on your question):

WITH MyTable AS (
  SELECT
    1 AS chn,
    [2, 3, 4, 5, 6] AS auds
  UNION ALL SELECT
    2 AS chn,
    [7, 8, 9, 10, 11] AS auds
)
SELECT
  ARRAY(SELECT first_aud + second_auds[OFFSET(off)]
        FROM UNNEST(first_auds) AS first_aud WITH OFFSET off)
    AS summed_auds
FROM (
  SELECT
    (SELECT auds FROM MyTable WHERE chn = 1) AS first_auds,
    (SELECT auds FROM MyTable WHERE chn = 2) AS second_auds
);
+---------------------+
|    summed_auds      |
+---------------------+
| [9, 11, 13, 15, 17] |
+---------------------+

Edit: one more example that sums corresponding array elements across all rows. This probably won't be particularly efficient, but it should produce the intended result:

WITH MyTable AS (
  SELECT
    1 AS chn,
    [2, 3, 4, 5, 6] AS auds
  UNION ALL SELECT
    2 AS chn,
    [7, 8, 9, 10, 11] AS auds
  UNION ALL SELECT
    3 AS chn,
    [-1, -6, 2, 3, 2] AS auds
)
SELECT
  ARRAY(SELECT
          (SELECT SUM(auds[OFFSET(off)]) FROM UNNEST(all_auds))
        FROM UNNEST(all_auds[OFFSET(0)].auds) WITH OFFSET off)
    AS summed_auds
FROM (
  SELECT
    ARRAY_AGG(STRUCT(auds)) AS all_auds
  FROM MyTable
);
+--------------------+
|    summed_auds     |
+--------------------+
| [8, 5, 15, 18, 19] |
+--------------------+

Upvotes: 7

Dakota Kronberger
Dakota Kronberger

Reputation: 119

Just use GROUP BY in conjunction with SUM.

SELECT SUM(auds), chn FROM [MyTable] GROUP BY chn

Upvotes: 0

Related Questions