Canovice
Canovice

Reputation: 10173

BigQuery how to automatically handle "duplicate column names" on left join

I am working with a dataset of tables that (a) often requires joining tables together, however also (b) frequently has duplicate columns names. Any time I write a query along the lines of:

SELECT 
  t1.*, t2.*
FROM t1 
LEFT JOIN t2 ON t1.this_id = t2.matching_id

...I get the error Duplicate column names in the result are not supported. Found duplicate(s): this_col, that_col, another_col, more_cols, dupe_col, get_the_idea_col

I understand that with BigQuery, it is better to avoid using * when selecting tables, however my data tables aren't too big + my bigquery budget is high, and doing these joins with all columns helps significantly with data exploration.

Is there anyway BigQuery can automatically handle / rename columns in these situations (e.g. prefix the column with the table name), as opposed to not allowing the query all together?

Thanks!

Upvotes: 1

Views: 4671

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Is there anyway BigQuery can automatically handle / rename columns in these situations (e.g. prefix the column with the table name), as opposed to not allowing the query all together?

This is possible with BigQuery Legacy SQL - which can be handy for data exploration unless you are dealing with data types or using some functions/features specific to standard sql

So below

#legacySQL
SELECT t1.*, t2.*
FROM table1 AS t1 
LEFT JOIN table2 AS t2 
ON t1.this_id = t2.matching_id   

will produce output where all column names will be prefixed with respective alias like t1_this_id and t2_matching_id

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269773

The simplest way is to select records rather than columns:

SELECT t1, t2
FROM t1 LEFT JOIN
     t2
     ON t1.this_id = t2.matching_id;

This is pretty much what I do for ad hoc queries.

If you want the results as columns and not records (they don't look much different in the results), you can use EXCEPT:

SELECT t1.* EXCEPT (duplicate_column_name),
       t2.* EXCEPT (duplicate_column_name),
       t1.duplicate_column_name as t1_duplicate_column_name,
       t2.duplicate_column_name as t2_duplicate_column_name
FROM t1 LEFT JOIN
     t2
     ON t1.this_id = t2.matching_id;

Upvotes: 6

Related Questions