dorachan2010
dorachan2010

Reputation: 1091

How do you compare two arrays in BigQuery?

I am trying to join two tables that each have an array column like the following

SELECT a.id, b.value
FROM a INNER JOIN b
ON a.array IN b.array

or

SELECT a.id, b.value
FROM a INNER JOIN b
ON UNNEST(a.array) IN UNNEST(b.array)

According to this SO question, postgres has operators like <@ and >@ that compares if either is a subset of the other array ( postgres doc page) but BigQuery only allows an element of the array to be compared with the other array like the following

a.arrayelement IN UNNEST(b.array)

Can it be done in BigQuery?

edit

This is the schema I am working with

WITH b AS (
    {  "ip": "192.168.1.1",
      "cookie": [
        { "key": "apple",
          "value: "red"
        },
        { "key": "peach",
          "value: "pink"
        },
        { "key": "orange",
          "value: "orange"
        }
      ]
    }
    ,{  "ip": "192.168.1.2",
      "cookie": [
        { "key": "apple",
          "value: "red"
        },
        { "key": "orange",
          "value: "orange"
        }
      ]
    }
   ),
WITH a AS (
    {  "id": "12345",
      "cookie": [
        { "key": "peach",
          "value: "pink"
        }
      ]
    }
    ,{  "id": "67890",
      "cookie": [
        { "key": "apple",
          "value: "red"
        },
        { "key": "orange",
          "value: "orange"
        },

      ]
     }
)

I am expecting an output like the following

ip, id
192.168.1.1, 67890 
192.168.1.2, 67890 
192.168.1.2, 12345

It is a continuation of the following SO, How do I find elements in an array in BigQuery . I tried using subqueries to compare a single element of one of the array, but BigQuery returns an error saying that I have "too many subqueries"

Upvotes: 15

Views: 22160

Answers (2)

Mosha Pasumansky
Mosha Pasumansky

Reputation: 14004

Here is an alternative solution, which avoids running JOIN in correlated subquery, and instead relies on IN UNNEST() expression - this should give better performance:

#standardSQL
WITH a AS (
  SELECT 1 AS id, [2,4] AS a_arr UNION ALL
  SELECT 2, [3,5]
),
b AS (
  SELECT 11 AS value, [1,2,3,4] AS b_arr UNION ALL
  SELECT 12, [1,3,5,6]
)
SELECT a.id, b.value
FROM a , b
WHERE (SELECT LOGICAL_AND(a_i IN UNNEST(b.b_arr)) FROM UNNEST(a.a_arr) a_i)

Upvotes: 13

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173003

Try below example (BigQuery Standard SQL)

#standardSQL
WITH a AS (
  SELECT 1 AS id, [2,4] AS a_arr UNION ALL
  SELECT 2, [3,5]
),
b AS (
  SELECT 11 AS value, [1,2,3,4] AS b_arr UNION ALL
  SELECT 12, [1,3,5,6]
)
SELECT a.id, b.value
FROM a , b , UNNEST([(SELECT ARRAY_LENGTH(a.a_arr) - COUNT(1) 
                      FROM UNNEST(a.a_arr) AS x 
                      JOIN UNNEST(b.b_arr)  AS y 
                      ON x = y)]) AS z
WHERE z = 0

it mimics below pseudo-code:

SELECT a.id, b.value
FROM a INNER JOIN b
ON a.array IN b.array  

Let me know if you want me to apply this to your example - or you will try by yourself first :o)

Upvotes: 9

Related Questions