Deviling Master
Deviling Master

Reputation: 3113

Google BigQuery: Table join with REPEATED RECORD values

I have a table like this:

Row field1  field2  field3.a    field3.b     
1   value1  1       id1         key5     
                    id2         key6     
2   value2  2       id3         key7     
                    id4         key8

The first 2 columns are standard fields, the third column is a REPEATED RECORD field.

I have 2 additional tables

Row  id     value    
1    id1    my-valueA    
2    id2    my-valueB    
3    id3    my-valueC    
4    id4    my-valueD

and

Row id      value    
1   key5    my-valueE    
2   key6    my-valueF    
3   key7    my-valueG    
4   key8    my-valueH

Which map the a id/key from the first table (I used the term key and id to avoid confusion, but at the end the concept is the same) with a given value

Here is a full statement to reproduce the structure

#standardSQL
WITH my_table AS (
  SELECT "value1" as field1, 1 as field2, [STRUCT("id1" as a, "key5" as b),STRUCT("id2" as a, "key6" as b)] as field3
  UNION ALL
  SELECT "value2" as field2, 2 as field2, [STRUCT("id3" as a, "key7" as b),STRUCT("id4" as a, "key8" as b)] as field3
),

ids_table AS (
  SELECT "id1" as id, "my-valueA" as value
  UNION ALL
  SELECT "id2" as id, "my-valueB" as value
  UNION ALL
  SELECT "id3" as id, "my-valueC" as value
  UNION ALL
  SELECT "id4" as id, "my-valueD" as value
),

keys_table AS (
  SELECT "key5" as id, "my-valueE" as value
  UNION ALL
  SELECT "key6" as id, "my-valueF" as value
  UNION ALL
  SELECT "key7" as id, "my-valueG" as value
  UNION ALL
  SELECT "key8" as id, "my-valueH" as value
)

-- SELECT * FROM my_table
-- SELECT * FROM ids_table
-- SELECT * FROM keys_table

My goal is to replace the key/id values from the first tables with the value given by the 2 other tables, like a classic join on id.

Here is the expected output

Row field1  field2  t2_value    t3_value     
1   value1  1       my-valueA   my-valueE    
                    my-valueB   my-valueF    
2   value2  2       my-valueC   my-valueG    
                    my-valueD   my-valueH

At first I thought about using UNNEST operator to obtain flat lines, so a simple JOIN can be made to resolve the value and after that rejoin the array with the replaced values.

SELECT my_table.* EXCEPT(field3),
  t2.value as t2_value,
  t3.value as t3_value
FROM my_table CROSS JOIN UNNEST(my_table.field3) AS t1 
  LEFT JOIN ids_table AS t2 ON t1.a = t2.id
  LEFT JOIN keys_table as t3 ON t1.b = t3.id

With this statement the values are properly replace from the id to the value, but now I'm unable to reproduce the RECORD REPEATED structure of before

Upvotes: 3

Views: 2406

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172964

Below is for BigQuery Standard SQL

#standardSQL
WITH my_table AS (
  SELECT "value1" AS field1, 1 AS field2, [STRUCT("id1" AS a, "key5" AS b),STRUCT("id2" AS a, "key6" AS b)] AS field3 UNION ALL
  SELECT "value2" AS field2, 2 AS field2, [STRUCT("id3" AS a, "key7" AS b),STRUCT("id4" AS a, "key8" AS b)] AS field3
), ids_table AS (
  SELECT "id1" AS id, "my-valueA" AS value UNION ALL
  SELECT "id2" AS id, "my-valueB" AS value UNION ALL
  SELECT "id3" AS id, "my-valueC" AS value UNION ALL
  SELECT "id4" AS id, "my-valueD" AS value
),keys_table AS (
  SELECT "key5" AS id, "my-valueE" AS value UNION ALL
  SELECT "key6" AS id, "my-valueF" AS value UNION ALL
  SELECT "key7" AS id, "my-valueG" AS value UNION ALL
  SELECT "key8" AS id, "my-valueH" AS value
)
SELECT 
  field1, field2,
  (
    SELECT ARRAY_AGG(STRUCT<a_value STRING, b_value STRING>(t2.value, t3.value)) 
    FROM UNNEST(field3) t1
      LEFT JOIN ids_table AS t2 ON t1.a = t2.id
      LEFT JOIN keys_table AS t3 ON t1.b = t3.id
  ) AS field3
FROM my_table    

with below output

field1  field2  field3.a_value  field3.b_value   
value1  1       my-valueA       my-valueE    
                my-valueB       my-valueF    
value2  2       my-valueC       my-valueG    
                my-valueD       my-valueH      

Upvotes: 3

Related Questions