SQL: Is UNION the same as SELECT DISTINCT with UNION ALL subquery

Question

Are these two SQL statements identical in all scenarios?

SELECT DISTINCT * FROM 
(SELECT * FROM tablename UNION ALL SELECT * FROM tablename2) dummyname

and

SELECT * FROM tablename UNION SELECT * FROM tablename2

The thing I always find peculiar about UNION is that it removes all non-distinct rows even if those non-distinct rows appear within a single part of the UNION clause. For example, lets say tablename has one column: amt INT. Let's say amt INT is NOT a primary key and does not have any unique constraint on it. Then let's say I have two rows where amt = 20. Regardless of the UNION that I do, only one of those two rows will appear in the resulting set.

I was explaining this behavior to someone else, and then came up with above examples to demonstrate the behavior of UNION... I believe those two statements above are the same (always), but I just want to make sure that what I said is correct. Are there any edge cases where the two SQL statements above will not generate a totally identical result set?

Gordon Linoff · Accepted Answer

The two versions are identical: UNION is functionally equivalent to SELECT DISTINCT on the UNION ALL.

UNION removes duplicate values within a table as well as values shared between the two tables.

I should note that from a performance point of view, the following is often faster than just a UNION:

select distinct t.col1, t.col2
from t
union
select distinct t1.col1, t.col2
from t1;

This may be counterintuitive. But, if there is an index on (col1, col2) in the tables, then it can be used for the select distinct, reducing the number of rows for the union. Some optimizers might miss this opportunity to simplify the inputs.

SQL: Is UNION the same as SELECT DISTINCT with UNION ALL subquery

Answers (2)

Related Questions