Reputation: 419
I have a table with 30 columns. All but two column (CropVariety, Date) values in the same row groups are the same. I want to aggregate groups of rows with the same column values (except CropVariety, Date) into one row, stack different column values into one cell.
Exapmple:
ID Field Year IDFarm Farm Type CropVariety Date
1222980 MU-9 - EKO 2020 219255 MU Cover Crop Rice 2020-04-03
1222980 MU-9 - EKO 2020 219255 MU Cover Crop Soy 2020-04-07
3333333 AB-9 - EKO 2020 444444 AB Cover Crop Lentil 2020-05-03
3333333 AB-9 - EKO 2020 444444 AB Cover Crop Hemp 2020-05-07
Desired output:
ID Field Year IDFarm Farm Type CropVariety Date
1222980 MU-9 - EKO 2020 219255 MU Cover Crop Rice, Soy 2020-04-03, 2020-04-07
3333333 AB-9 - EKO 2020 444444 AB Cover Crop Lentil, Hemp 2020-05-03, 2020-05-07
I assume i should use ARRAY_AGG or ARRAY_CONCAT_AGG function, but my formulas were always wrong so i don't know really how to use is.
Upvotes: 0
Views: 336
Reputation: 5503
Update: check Mikhail's answer for the syntax to specify columns in batch.
======= You said ARRAY_AGG but your expected results look like a string. You can use query below (and replace STRING_AGG() with ARRAY_AGG() if you do want array)
WITH data AS (
SELECT 1222980 ID, 'MU-9 - EKO' Field, 2020 Year, "Rice" CropVariety, DATE('2020-04-03') Date
UNION ALL
SELECT 1222980 ID, 'MU-9 - EKO' Field, 2020 Year, "Soy" CropVariety, DATE('2020-04-07') Date
UNION ALL
SELECT 3333333 ID, 'AB-9 - EKO' Field, 2020 Year, "Lentil" CropVariety, DATE('2020-05-03') Date
UNION ALL
SELECT 3333333 ID, 'AB-9 - EKO' Field, 2020 Year, "Hemp" CropVariety, DATE('2020-05-07') Date
)
SELECT ID, Field, Year, string_agg(CropVariety, ', '), string_agg(CAST(Date AS STRING), ', ')
FROM data
GROUP BY 1,2,3
Output:
+---------+------------+------+--------------+------------------------+
| ID | Field | Year | f0_ | f1_ |
+---------+------------+------+--------------+------------------------+
| 1222980 | MU-9 - EKO | 2020 | Rice, Soy | 2020-04-03, 2020-04-07 |
| 3333333 | AB-9 - EKO | 2020 | Lentil, Hemp | 2020-05-03, 2020-05-07 |
+---------+------------+------+--------------+------------------------+
Upvotes: 1
Reputation: 173003
I have a table with 30 columns.
Typing 30 (or in many practical case even more) column names in SELECT list and GROUP BY clause is always a pain and source for typos, etc.
All but two column (CropVariety, Date) values in the same row groups are the same.
Obviously, would be great to use only those two column names in query
So, below is generic solution for such cases (BigQuery Standard SQL)
#standardSQL
SELECT ANY_VALUE(t).* EXCEPT(CropVariety, Date),
STRING_AGG(CropVariety, ', ') AS CropVariety,
STRING_AGG(CAST(Date AS STRING), ', ') AS Date
FROM `project.dataset.table` t
GROUP BY TO_JSON_STRING((SELECT AS STRUCT * EXCEPT(CropVariety, Date) FROM UNNEST([t])))
As you can see here - only CropVariety
and Date
column names are used explicitly - the rest is taken care by code
If to apply to sample data from your question - output is
Row ID Field Year CropVariety Date
1 1222980 MU-9 - EKO 2020 Rice, Soy 2020-04-03, 2020-04-07
2 3333333 AB-9 - EKO 2020 Lentil, Hemp 2020-05-03, 2020-05-07
Upvotes: 2