Reputation: 19648
I have two tables which are pretty wide (30 columns) but have very similar schema (20+ columns shared across both tables). I want to union those two tables together but want the new table to have all the fields from both tables.
Similar to the idea from this Stackoverflow question.
However, the challenges arise when I start writing the queries that I have to specify all the columns from each table and not only that, but also use NULL to fill in the columns that is only present from the other table.
Like this:
select
commoncolumn1,
commoncolumn2,
table1_only_column1,
NULL as table2_only_column1
...
from table1
union all
select
commoncolumn1,
commoncolumn2,
NULL as table1_only_column1,
table2_only_column1
...
from table2
This query gets really long as the number of columns increases and also not quite robust to schema changes. Is there a better way of union two tables which auto fill NULLs for non-present tables?
I am using Impala/Hive but if there is an ANSI way of doing this, that will be great!
Upvotes: 1
Views: 6806
Reputation: 18408
FWIW, SQL UNION will match columns by ordinal position, not by name. If the name of a column is identical in both SELECTs then that is also the name for the column of the result. If not, then the column name for the result is "implementation-defined".
The meaning/consequence of all of this is that the onus is on you to beware of both the ordering of the columns in your SELECTs as well as the identical naming of those columns (if you want subsequent reference to those columns in the result of the UNION to be possible by using a name instead of a column number).
That's what the standard mandates compliant implementations to behave like, so you have very little chance of finding an implementation that can relieve you of the sufferings this causes for you.
Upvotes: 1
Reputation: 2341
From Wikipedia (emphasis mine):
In SQL the UNION clause combines the results of two SQL queries into a single table of all matching rows. The two queries must result in the same number of columns and compatible data types in order to unite.
So in short, you must specify NULL
for non-applicable columns. You could script something to generate the SQL, or use the same column (with another column indicating which attribute it is), but you're kind of limited.
Not being familiar with Hive/Impala, there might be better options there.
For what it's worth, here's the MySQL reference page for UNION
. Interestingly, it doesn't explicitly state that the same number of columns is required (although that might just be because it's assumed).
As @AlexM notes, SELECT *
might be an option. However, you want to be careful with this, since the order of the columns might change, or new columns might be added, which would break the UNION
query (standard warnings against using SELECT *
).
Upvotes: 0