Reputation: 2116
I want to select distinct values of columns for each user in my table (in Google BigQuery). I've thought about using ARRAY_AGG()
like:
SELECT user_id, ARRAY_AGG(DISTINCT field1) AS f1, ARRAY_AGG(DISTINCT field2) AS f2
FROM t GROUP BY user_id
But because for some user_id
s all values in field1
or field2
are null, I get this error message: Array 'f1' cannot have a null element
I wonder if there is a workaround to avoid this error or maybe a different way to achieve the result without using ARRAY_AGG()
Upvotes: 9
Views: 17828
Reputation: 8189
BigQuery finally implemented suggestion from Elliott Brossard and now you can do:
SELECT user_id, ARRAY_AGG(DISTINCT field1 IGNORE NULLS) AS f1, ARRAY_AGG(DISTINCT field2 IGNORE NULLS) AS f2
FROM t
GROUP BY user_id
Upvotes: 14
Reputation: 12808
When you run into nulls using ARRAY_AGG(), specify IGNORE NULLS:
IGNORE NULLS or RESPECT NULLS: If IGNORE NULLS is specified, the NULL values are excluded from the result. If RESPECT NULLS or if neither is specified, the NULL values are included in the result. An error is raised if an array in the final query result contains a NULL element.
From: https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#array_agg
Working example using IGNORE NULLS:
SELECT
category,
array_agg(fruit IGNORE NULLS) as fruits_array,
FROM (SELECT "A" as category, NULL as fruit UNION ALL
SELECT "B" as category, "apple" as fruit UNION ALL
SELECT "B" as category, "pear" as fruit UNION ALL
SELECT "C" as category, "orange" as fruit)
GROUP BY category
ORDER BY category
Upvotes: 1
Reputation: 497
From https://cloud.google.com/bigquery/sql-reference/data-types#array-type
BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
your query is ok in a temporary query but not ok in end query result; so a workaround is you may define your query as a tmp table and filter out NULL values before giving end results:
WITH tmp AS (SELECT user_id,
ARRAY_AGG(DISTINCT field1) AS f1,
ARRAY_AGG(DISTINCT field2) AS f2
FROM t GROUP BY user_id)
SELECT user_id,
ARRAY(SELECT el FROM UNNEST(f1) AS el WHERE el IS NOT NULL) AS f1,
ARRAY(SELECT el FROM UNNEST(f2) AS el WHERE el IS NOT NULL) AS f2
FROM tmp
I've met same problems when porting over some Postgres SQL into BigQuery, a more elegant solution is FILTER clause on aggregation functions,
https://www.postgresql.org/docs/current/static/sql-expressions.html
like ARRAY_AGG(f1 FILTER WHEN f1 IS NOT NULL)
which is not available in BigQuery which I really hope they can implement it
Upvotes: 5
Reputation: 172993
WITH t1 AS (
SELECT user_id, ARRAY_AGG(DISTINCT field1) AS f1
FROM t WHERE field1 IS NOT NULL
GROUP BY user_id
),
t2 AS (
SELECT user_id, ARRAY_AGG(DISTINCT field2) AS f2
FROM t WHERE field2 IS NOT NULL
GROUP BY user_id
)
SELECT t1.user_id, f1, f2
FROM t1 FULL JOIN t2
ON t1.user_id = t2.user_id
Upvotes: 2