Reputation: 23
SELECT *
FROM `project.dataset1.table1` t1
FULL OUTER JOIN `project.dataset2.table2` t2
USING (Name)
FULL OUTER JOIN `project.dataset3.table3` t3
USING (Name)
WHERE Name LIKE '%John%'
project.dataset1.table1
and project.dataset3.table3
both have the same column names.
I get the error "Duplicate column names in the result are not supported."
What is the correct way of combining these tables in BigQuery?
Upvotes: 2
Views: 1550
Reputation: 172944
Let me first explain what is going on here
In below example - there is no problem with duplicate column name rather than Name
but it is addressed by having Name within USING()
vs ON
#standardSQL
WITH
`project.dataset1.table1` AS (SELECT 'John' Name, 'a' x ),
`project.dataset2.table2` AS (SELECT 'John' Name, 'b' y),
`project.dataset3.table3` AS (SELECT 'John' Name, 'c' z)
SELECT *
FROM `project.dataset1.table1` t1
FULL OUTER JOIN `project.dataset2.table2` t2 USING (Name)
FULL OUTER JOIN `project.dataset3.table3` t3 USING (Name)
WHERE Name LIKE '%John%'
In the second example below, you see extra field dup_col
that is duplicate (in table1 and in table2) - thus the error - Duplicate column names in the result are not supported
#standardSQL
WITH
`project.dataset1.table1` AS (SELECT 'John' Name, 'a' x, 'qq' dup_col),
`project.dataset2.table2` AS (SELECT 'John' Name, 'b' y, 'ww' dup_col),
`project.dataset3.table3` AS (SELECT 'John' Name, 'c' z)
SELECT *
FROM `project.dataset1.table1` t1
FULL OUTER JOIN `project.dataset2.table2` t2 USING (Name)
FULL OUTER JOIN `project.dataset3.table3` t3 USING (Name)
WHERE Name LIKE '%John%'
Finally, in below example you find how to address that issue
#standardSQL
WITH
`project.dataset1.table1` AS (SELECT 'John' Name, 'a' x, 'qq' dup_col),
`project.dataset2.table2` AS (SELECT 'John' Name, 'b' y, 'ww' dup_col),
`project.dataset3.table3` AS (SELECT 'John' Name, 'c' z)
SELECT * EXCEPT(dup_col),
t1.dup_col AS t1_dup_col,
t2.dup_col AS t2_dup_col
FROM `project.dataset1.table1` t1
FULL OUTER JOIN `project.dataset2.table2` t2 USING (Name)
FULL OUTER JOIN `project.dataset3.table3` t3 USING (Name)
WHERE Name LIKE '%John%'
As you can see here, we added EXCEPT(dup_col)
which eliminates that duplicate columns from output, but than we explicitly added those columns with distinct aliases
So, now there is no duplicate columns in output and it is as below
Row Name x y z t1_dup_col t2_dup_col
1 John a b c qq ww
Hope, you will be able to apply this to your specific duplicate columns
If you have more than just one name - just list them in EXCEPT as
EXCEPT(dup_col1, dup_col2, dup_col3)
Upvotes: 3