mtreadway
mtreadway

Reputation: 45

how to create a table join on elements in an Array in Google BigQuery

I have some data, contact_IDs, that are in an Array of Structs inside a table called deals as in the example below.

WITH deals AS (
  Select "012345" as deal_ID,
    [STRUCT(["abc"] as company_ID, [123,678,810] as contact_ID)]
      AS associations)
SELECT 
  deal_ID,
  ARRAY(
    SELECT AS STRUCT  
      ( SELECT STRING_AGG(CAST(id AS STRING), ', ') 
        FROM t.contact_ID id
      ) AS contact_ID 
    FROM d.associations t
  ) AS contacts
FROM deals d

The query above takes the contact_IDs in the associations array and list them separated by commas.

Row    deal_ID    contacts.contact_ID   
1      012345     123, 678, 810

But my problem now is that I need to replace the contact_IDs with with first and last names from another table called contacts that looks like the following where contact_ID is INT64 and the name fields are Strings.

contact_id    first_name   last_name
123           Jane         Doe
678           John         Smith
810           Alice        Acre

I've attempted doing it with a subquery like this:

WITH deals AS (
  Select "012345" as deal_ID,
    [STRUCT(["abc"] as company_ID, [123,678,810] as contact_ID)]
      AS associations)
SELECT 
  deal_ID,
  ARRAY(
    SELECT AS STRUCT 
      company_ID, 
      ( SELECT STRING_AGG(
        (select concat(c.first_name, " ", c.last_name)
         from contacts c
         where c.contact_id=id), ', ') 
        FROM t.contact_ID id
      ) AS contact_name 
    FROM d.associations t
  ) AS contacts
FROM deals d

But this gives an error "Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN." But I can't figure out how to make a join between deals.associations.contact_ID and contacts.contact_id when the thing I need to be joining on is inside the deals.associations array... Thanks in advance for any guidance.

Upvotes: 1

Views: 506

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

Below is for BigQuery Standard SQL

#standardSQL
SELECT deal_ID, 
  ARRAY_AGG(STRUCT(company_ID, contact_name)) AS contacts
FROM (
  SELECT 
    deal_ID, 
    ANY_VALUE(company_ID) AS company_ID, 
    STRING_AGG(FORMAT('%s %s', IFNULL(first_name, ''), IFNULL(last_name, '')), ', ') AS contact_name
  FROM deals d, 
    d.associations AS contact, 
    contact.contact_ID AS contact_ID
  LEFT JOIN contacts c
  USING(contact_ID)
  GROUP BY deal_ID, FORMAT('%t', company_ID)
)
GROUP BY deal_ID   

if applied to sample data from your question - output is

Row deal_ID     contacts.company_ID     contacts.contact_name    
1   012345      abc                     Jane Doe, John Smith, Alice Acre    

Note - below

  FROM deals d, 
    d.associations AS contact, 
    contact.contact_ID AS contact_ID   

is a shortcut for

  FROM deals, 
    UNNEST(associations) AS contact, 
    UNNEST(contact_ID) AS contact_ID

Somehow - this is my preference when possible not to use explicit UNNEST() in the query text

Upvotes: 1

Related Questions