S.Mohsen sh
S.Mohsen sh

Reputation: 2116

Using ARRAY_AGG() with all null values in input

I want to select distinct values of columns for each user in my table (in Google BigQuery). I've thought about using ARRAY_AGG() like:

SELECT user_id, ARRAY_AGG(DISTINCT field1) AS f1, ARRAY_AGG(DISTINCT field2) AS f2
FROM t GROUP BY user_id

But because for some user_ids all values in field1 or field2 are null, I get this error message: Array 'f1' cannot have a null element

I wonder if there is a workaround to avoid this error or maybe a different way to achieve the result without using ARRAY_AGG()

Upvotes: 9

Views: 17828

Answers (4)

Sebastian Kreft
Sebastian Kreft

Reputation: 8189

BigQuery finally implemented suggestion from Elliott Brossard and now you can do:

SELECT user_id, ARRAY_AGG(DISTINCT field1 IGNORE NULLS) AS f1, ARRAY_AGG(DISTINCT field2 IGNORE NULLS) AS f2
FROM t
GROUP BY user_id

Upvotes: 14

Sander van den Oord
Sander van den Oord

Reputation: 12808

When you run into nulls using ARRAY_AGG(), specify IGNORE NULLS:

IGNORE NULLS or RESPECT NULLS: If IGNORE NULLS is specified, the NULL values are excluded from the result. If RESPECT NULLS or if neither is specified, the NULL values are included in the result. An error is raised if an array in the final query result contains a NULL element.

From: https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#array_agg

Working example using IGNORE NULLS:

SELECT
    category, 
    array_agg(fruit IGNORE NULLS) as fruits_array,
FROM (SELECT "A" as category, NULL as fruit UNION ALL
      SELECT "B" as category, "apple" as fruit UNION ALL
      SELECT "B" as category, "pear" as fruit UNION ALL
      SELECT "C" as category, "orange" as fruit)
GROUP BY category
ORDER BY category

Upvotes: 1

TomasJ
TomasJ

Reputation: 497

From https://cloud.google.com/bigquery/sql-reference/data-types#array-type

BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.

your query is ok in a temporary query but not ok in end query result; so a workaround is you may define your query as a tmp table and filter out NULL values before giving end results:

WITH tmp AS (SELECT user_id,
                    ARRAY_AGG(DISTINCT field1) AS f1,
                    ARRAY_AGG(DISTINCT field2) AS f2
FROM t GROUP BY user_id)

SELECT user_id,
    ARRAY(SELECT el FROM UNNEST(f1) AS el WHERE el IS NOT NULL) AS f1,
    ARRAY(SELECT el FROM UNNEST(f2) AS el WHERE el IS NOT NULL) AS f2
 FROM tmp

I've met same problems when porting over some Postgres SQL into BigQuery, a more elegant solution is FILTER clause on aggregation functions,

https://www.postgresql.org/docs/current/static/sql-expressions.html

like ARRAY_AGG(f1 FILTER WHEN f1 IS NOT NULL) which is not available in BigQuery which I really hope they can implement it

Upvotes: 5

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

WITH t1 AS (
  SELECT user_id, ARRAY_AGG(DISTINCT field1) AS f1
  FROM t WHERE field1 IS NOT NULL
  GROUP BY user_id
),
t2 AS (
  SELECT user_id, ARRAY_AGG(DISTINCT field2) AS f2
  FROM t WHERE field2 IS NOT NULL
  GROUP BY user_id
)
SELECT t1.user_id, f1, f2
FROM t1 FULL JOIN t2 
ON t1.user_id = t2.user_id

Upvotes: 2

Related Questions