Reputation: 329
I have a BigQuery SQL below. The source
table is aggregated from another table and field x
contains distinct values. To further aggregate on field name
, the query unnests x
and uses array_agg(distinct x)
to merge the x
arrays. It works, but it takes two steps to aggregate x
and y
separately. How to obtain the same results in a single step of aggregation on name
?
with source AS (
SELECT [1, 2, 3, 4] AS x, 10 AS y, 'john' AS name
UNION ALL
SELECT [3, 4, 5, 6], 3.3, 'john'
UNION ALL
SELECT [7, 8, 9], 12.7, 'jacob'
),
agg_x AS (
select name, array_agg(distinct x) AS distinct_x_array
from source, unnest(x) x
group by name
),
agg_y AS (
select name, sum(y) sum_y
from source
group by name
)
select name, distinct_x_array, sum_y
FROM agg_x join agg_y using (name)
Upvotes: 1
Views: 887
Reputation: 173171
Even better option (pure "single step")
select name, array_agg(distinct x) distinct_x_array, sum(if(offset=0,y,0)) sum_y
from source, unnest(x) x with offset
group by name
with output
Upvotes: 2
Reputation: 173171
Consider below approach (no joins involved)
select name, sum_y,
array(select distinct x from t.x_array x) as distinct_x_array
from (
select name, array_concat_agg(x) x_array, sum(y) sum_y
from source group by name
) t
if applied to sample data in your question - output is
Upvotes: 2