Canovice
Canovice

Reputation: 10173

BigQuery union tables with different columns, fill in with nulls

I have a fairly straightforward UNION question. I am trying to Union several tables together that have overlapping, but not identical, column names. For example:

SELECT 
  'joe' name, 'bills' team, false as is_o,
  7 as stat1, 95 as stat1_pctile, 
  9 as stat2, 82 as stat2_pctile
UNION ALL
SELECT 
  'joe', 'bills', true as is_o,
  1 as stat1, 37 as stat1_pctile,
  2 as stat3, 17 as stat3_pctile
UNION ALL
SELECT 
  'tim' name, 'jets' team, false as is_o,
  12 as stat2, 85 as stat2_pctile,
  13 as stat3, 69 as stat3_pctile

This currently returns the following table:

Row name    team    is_o    stat1   stat1_pctile    stat2   stat2_pctile    
1   joe     bills   FALSE   7       95              9       82  
2   joe     bills   TRUE    1       37              2       17  
3   tim     jets    FALSE   12      85             13       69  

however I am seeking this table:

Row name    team    is_o    stat1   stat1_pctile    stat2   stat2_pctile  stat3   stat3_pctile  
1   joe     bills   FALSE   7       95              9       82            null    null
2   joe     bills   TRUE    1       37              null .  null          2       17    
3   tim     jets    FALSE   null .  null            12      85            13      69    

Is this possible using UNION? Clearly UNION ALL is not doing the trick, as it expects (a) identical # of columns, (b) identical columns names, all (c) in the same order.

Upvotes: 2

Views: 3749

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

You can do the trick using BigQuery Legacy SQL as in example below

#legacySQL
SELECT * FROM (
SELECT 
  'joe' name, 'bills' team, FALSE AS is_o,
  7 AS stat1, 95 AS stat1_pctile, 
  9 AS stat2, 82 AS stat2_pctile
), (
SELECT 
  'joe' name, 'bills' team, TRUE AS is_o,
  1 AS stat1, 37 AS stat1_pctile,
  2 AS stat3, 17 AS stat3_pctile
), (
SELECT 
  'tim' name, 'jets' team, FALSE AS is_o,
  12 AS stat2, 85 AS stat2_pctile,
  13 AS stat3, 69 AS stat3_pctile
)   

above returns

Row name    team    is_o    stat1   stat1_pctile    stat2   stat2_pctile    stat3   stat3_pctile     
1   joe     bills   false   7       95              9       82              null    null     
2   joe     bills   true    1       37              null    null            2       17   
3   tim     jets    false   null    null            12      85              13      69   

Note: in legacy mode comma means union all

Upvotes: 3

Oscar
Oscar

Reputation: 141

For each of the columns that is missing from one of your tables, you can select it as null:

SELECT 
  'joe' name, 'bills' team, false as is_o,
  7 as stat1, 95 as stat1_pctile, 
  9 as stat2, 82 as stat2_pctile,
  null as stat3, null as stat3_pctile
UNION ALL
SELECT 
  'joe', 'bills', true as is_o,
  1 as stat1, 37 as stat1_pctile,
  null as stat2, null as stat2_pctile,
  2 as stat3, 17 as stat3_pctile
UNION ALL
SELECT 
  'tim' name, 'jets' team, false as is_o,
  null as stat1, null as stat1_pctile,
  12 as stat2, 85 as stat2_pctile,
  13 as stat3, 69 as stat3_pctile

You also need to select them in the same order in each SELECT, as BigQuery's union doesn't pay attention to the column names, it just mashes the tables together. Eg. the following would not respect column names in the union:

SELECT 
  'joe' name, 'bills' team, false as is_o,
  7 as stat1, 95 as stat1_pctile, 
  9 as stat2, 82 as stat2_pctile
UNION ALL
SELECT 
  'bills', 'joe', true as is_o,
  null as stat2, null as stat2_pctile,
  1 as stat1, 37 as stat1_pctile

enter image description here

Upvotes: 1

Related Questions