H_Du
H_Du

Reputation: 23

Can I do a full outer join with 3 different tables in Google BigQuery

SELECT * 
FROM `project.dataset1.table1` t1  
FULL OUTER JOIN `project.dataset2.table2` t2 
USING (Name) 
FULL OUTER JOIN `project.dataset3.table3` t3 
USING (Name) 
WHERE Name LIKE '%John%'

project.dataset1.table1 and project.dataset3.table3 both have the same column names.

I get the error "Duplicate column names in the result are not supported."

What is the correct way of combining these tables in BigQuery?

Upvotes: 2

Views: 1550

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Let me first explain what is going on here

In below example - there is no problem with duplicate column name rather than Name but it is addressed by having Name within USING() vs ON

#standardSQL 
WITH 
  `project.dataset1.table1` AS (SELECT 'John' Name, 'a' x ),  
  `project.dataset2.table2` AS (SELECT 'John' Name, 'b' y),  
  `project.dataset3.table3` AS (SELECT 'John' Name, 'c' z)
SELECT * 
FROM `project.dataset1.table1` t1  
FULL OUTER JOIN `project.dataset2.table2` t2 USING (Name) 
FULL OUTER JOIN `project.dataset3.table3` t3 USING (Name) 
WHERE Name LIKE '%John%'   

In the second example below, you see extra field dup_col that is duplicate (in table1 and in table2) - thus the error - Duplicate column names in the result are not supported

#standardSQL 
WITH 
  `project.dataset1.table1` AS (SELECT 'John' Name, 'a' x, 'qq' dup_col),  
  `project.dataset2.table2` AS (SELECT 'John' Name, 'b' y, 'ww' dup_col),  
  `project.dataset3.table3` AS (SELECT 'John' Name, 'c' z)
SELECT * 
FROM `project.dataset1.table1` t1  
FULL OUTER JOIN `project.dataset2.table2` t2 USING (Name) 
FULL OUTER JOIN `project.dataset3.table3` t3 USING (Name) 
WHERE Name LIKE '%John%'   

Finally, in below example you find how to address that issue

#standardSQL 
WITH 
  `project.dataset1.table1` AS (SELECT 'John' Name, 'a' x, 'qq' dup_col),  
  `project.dataset2.table2` AS (SELECT 'John' Name, 'b' y, 'ww' dup_col),  
  `project.dataset3.table3` AS (SELECT 'John' Name, 'c' z)
SELECT * EXCEPT(dup_col), 
  t1.dup_col AS t1_dup_col, 
  t2.dup_col AS t2_dup_col
FROM `project.dataset1.table1` t1  
FULL OUTER JOIN `project.dataset2.table2` t2 USING (Name) 
FULL OUTER JOIN `project.dataset3.table3` t3 USING (Name) 
WHERE Name LIKE '%John%'     

As you can see here, we added EXCEPT(dup_col) which eliminates that duplicate columns from output, but than we explicitly added those columns with distinct aliases

So, now there is no duplicate columns in output and it is as below

Row Name    x   y   z   t1_dup_col  t2_dup_col   
1   John    a   b   c   qq          ww      

Hope, you will be able to apply this to your specific duplicate columns
If you have more than just one name - just list them in EXCEPT as

EXCEPT(dup_col1, dup_col2, dup_col3)   

Upvotes: 3

Related Questions