Rentian Dong
Rentian Dong

Reputation: 101

Google BigQuery Check If One Array is Superset/Subset of another

Given two arrays in Google BigQuery, I need to figure out whether ALL elements in one array are contained in the other.

As an example, I am trying to complete the following query:

WITH
  ArrayA AS (
  SELECT
    [1, 2, 3] arrA,
  UNION ALL
  SELECT
    [4, 5, 6])
  ArrayB AS (
  SELECT
    [1, 2, 3, 4, 5] arrB)
SELECT
  *
FROM
  ArrayA
CROSS JOIN
  ArrayB
WHERE
  <your code goes here>

such that the result looks like

arrA    | arrB
[1,2,3] | [1,2,3,4,5]

, since [1,2,3,4,5] is a superset of [1,2,3] but not a superset of [4,5,6].

Many thanks in advance.

Upvotes: 4

Views: 2993

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269673

I think these conditions do what you want:

WITH
  ArrayA AS (
  SELECT ARRAY[1, 2, 3] arrA,
  UNION ALL
  SELECT ARRAY[4, 5, 6]),
  ArrayB AS (
  SELECT ARRAY[1, 2, 3, 4, 5] arrB)
SELECT *
FROM ArrayA a CROSS JOIN
     ArrayB b
WHERE NOT EXISTS (SELECT a_el
                  FROM UNNEST(a.arrA) a_el LEFT JOIN
                       UNNEST(b.arrB) b_el
                       ON a_el = b_el
                  WHERE b_el IS NULL
                 ) AND
      NOT EXISTS (SELECT COUNT(*)
                  FROM UNNEST(a.arrA) a_el LEFT JOIN
                       UNNEST(b.arrB) b_el
                       ON a_el = b_el
                  HAVING COUNT(*) <> COUNT(b_el)
                 ) ;

Upvotes: 1

Sabri Karag&#246;nen
Sabri Karag&#246;nen

Reputation: 2365

You can check for every item in arrA, and then get minimum of it. If all the items of arrA in arrB, there will be 3 trues, so the minimum will be true. If at least one of them is not in arrB, there will be 2 true and 1 false, so the minimum will be false.

WITH
ArrayA AS (
    SELECT [1, 2, 3] arrA,
    UNION ALL
    SELECT [4, 5, 6]
),
ArrayB AS (
  SELECT [1, 2, 3, 4, 5] arrB
)
SELECT
  *, 
  (
    SELECT min(a in UNNEST(arrB))
    FROM UNNEST(arrA) as a
  ) as is_a_in_b
FROM ArrayA
CROSS JOIN ArrayB

You can also make it a function and use it in many places. Sorry for bad naming :)

CREATE TEMP FUNCTION is_array_in_array(subset ARRAY<int64>, main ARRAY<int64>) AS ((SELECT min(a in UNNEST(main)) FROM UNNEST(subset) as a));

WITH
ArrayA AS (
    SELECT [1, 2, 3] arrA,
    UNION ALL
    SELECT [4, 5, 6]
),
ArrayB AS (
  SELECT [1, 2, 3, 4, 5] arrB
)
SELECT
  *, 
  is_array_in_array(arrA, arrB) as is_a_in_b
FROM ArrayA
CROSS JOIN ArrayB

Upvotes: 6

Related Questions