BigQuery Join 2 tables based on (Array CONTAINED IN Array) condition

Question

I am trying to achieve the following. Suppose I have two tables:

WITH table_1 as (
SELECT
* FROM UNNEST([
  STRUCT([1] as A, [2,3,4] as B),
  STRUCT([2],[6,7])
  ])
)

Table 2:

WITH example as (
SELECT
* FROM UNNEST([
  STRUCT([1,2] as C, [77] as D),
  STRUCT([3,4],[88]),
  STRUCT([4],[99])
  ])
)

I would like to merge table_1 and table_2 based on the following condition that all values of C must be in B:

SELECT A, C, D FROM table_1 LEFT JOIN table_2 ON C CONTAINED IN B

This would result in the following table:

My question is if/how it is possible to get the desired result. I was not able to write a CONTAINED IN statement for two arrays as a condition for a LEFT JOIN statement. One additional requirement is that table 1 contains 100 million rows and table 2 25 thousand. Therefore the solution must be efficient. I know that increases the difficulty of the question... :P

Your help would be greatly appreciated!

Ranga Vure · Accepted Answer

WITH table_1 as (
SELECT
* FROM UNNEST([
  STRUCT([1] as A, [2,3,4] as B),
  STRUCT([2],[6,7])
  ])
),
table_2 as (
SELECT
* FROM UNNEST([
  STRUCT([1,2] as C, [77] as D),
  STRUCT([3,4],[88]),
  STRUCT([4],[99])
  ])
)

SELECT table_1.A, table_2.C, table_2.D
FROM table_1 , table_2 , UNNEST([
                      (SELECT ARRAY_LENGTH(table_2.C) - COUNT(1) 
                      FROM UNNEST(table_2.C) AS col_c 
                      JOIN UNNEST(table_1.B)  AS col_b 
                      ON col_c = col_b)]) AS x
WHERE x = 0

which results the desired output.

BigQuery Join 2 tables based on (Array CONTAINED IN Array) condition

Answers (1)

Related Questions