Maximilian
Maximilian

Reputation: 8500

BigQuery: Union / Join two tables with many columns, some overlapping

I have two large tables with some overlapping columns, some of which contain the same values in the overlapping columns. Here's a toy example (in the actual example, there are dozens of columns, both those that overlap and those that don't):

Table 1: a, b, c

Table 2: a, d, e

Some values of a are in only one table, some are in both.

Is there a query that will let me generate a table with all values where available:

Table 3: a, b, c, d, e

My current query requires listing every column, which is very verbose with dozens of columns, and inflexible when the schema changes:

SELECT
  coalesce(t1.a,
    t2.a) AS a,
  t1.b,
  t1.c,
  t2.d,
  t2.e
FROM
  t1
FULL JOIN
  t2
USING
  (a)

Things I've tried: UNION seems to require the same schema, SELECT t1.*, t2.* raises an error on overlapping columns, SELECT t1.* ... USING (a) will give nulls for values in a where there are values only in t1.a.

Upvotes: 0

Views: 3374

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173028

Before BigQuery Standard SQL got available to all of us in June 2, 2016 - I was extremely happy with what now called BigQuery Legacy SQL. I still enjoy it time by time for some specific use cases

I think the case you described in your question is exactly one where you can leverage feature of Legacy SQL to resolve your issue

So, below is for BigQuery Legacy SQL

#legacySQL
SELECT * 
FROM [project:dataset.table1],
  [project:dataset.table2]

Note: in BigQuery Legacy SQL comma - , - means UNION ALL

Super-simplified example of above is

#legacySQL
SELECT * 
FROM (SELECT1 a, 2 b, 3 c, 11 x),
 (SELECT 1 a, 4 d, 5 e, 12 x)   

with result

Row     a   b       c       x       d       e    
1       1   2       3       11      null    null     
2       1   null    null    12      4       5    

Note: you cannot mix Legacy and Standard SQL in same query, so if you need use Standard SQL against resulted UNION - you will need first to materialize(save) result as a table and then query that table using Standard SQL

Is there any way with Standard SQL

You can use INFORMATION_SCHEMA to script out columns from both tables and built list of all involved columns - but you sutill will need then to copy-paste result into final query to run it

Upvotes: 2

Related Questions