Aggregate same row groups into one row

Question

I have a table with 30 columns. All but two column (CropVariety, Date) values in the same row groups are the same. I want to aggregate groups of rows with the same column values (except CropVariety, Date) into one row, stack different column values into one cell.

Exapmple:

ID      Field      Year IDFarm Farm   Type         CropVariety  Date
1222980 MU-9 - EKO 2020 219255  MU    Cover Crop   Rice         2020-04-03
1222980 MU-9 - EKO 2020 219255  MU    Cover Crop   Soy          2020-04-07
3333333 AB-9 - EKO 2020 444444  AB    Cover Crop   Lentil       2020-05-03
3333333 AB-9 - EKO 2020 444444  AB    Cover Crop   Hemp         2020-05-07


Desired output:

ID      Field      Year IDFarm Farm   Type         CropVariety  Date
1222980 MU-9 - EKO 2020 219255  MU    Cover Crop   Rice, Soy    2020-04-03, 2020-04-07
3333333 AB-9 - EKO 2020 444444  AB    Cover Crop   Lentil, Hemp 2020-05-03, 2020-05-07

I assume i should use ARRAY_AGG or ARRAY_CONCAT_AGG function, but my formulas were always wrong so i don't know really how to use is.

Mikhail Berlyant · Accepted Answer

I have a table with 30 columns.

Typing 30 (or in many practical case even more) column names in SELECT list and GROUP BY clause is always a pain and source for typos, etc.

All but two column (CropVariety, Date) values in the same row groups are the same.

Obviously, would be great to use only those two column names in query

So, below is generic solution for such cases (BigQuery Standard SQL)

#standardSQL
SELECT ANY_VALUE(t).* EXCEPT(CropVariety, Date), 
  STRING_AGG(CropVariety, ', ') AS CropVariety, 
  STRING_AGG(CAST(Date AS STRING), ', ') AS Date
FROM `project.dataset.table` t
GROUP BY TO_JSON_STRING((SELECT AS STRUCT * EXCEPT(CropVariety, Date) FROM UNNEST([t])))

As you can see here - only CropVariety and Date column names are used explicitly - the rest is taken care by code

If to apply to sample data from your question - output is

Row ID          Field           Year    CropVariety     Date     
1   1222980     MU-9 - EKO      2020    Rice, Soy       2020-04-03, 2020-04-07   
2   3333333     AB-9 - EKO      2020    Lentil, Hemp    2020-05-03, 2020-05-07

Aggregate same row groups into one row

Answers (2)

Related Questions