VKK
VKK

Reputation: 912

SQL: Is UNION the same as SELECT DISTINCT with UNION ALL subquery

Are these two SQL statements identical in all scenarios?

SELECT DISTINCT * FROM 
(SELECT * FROM tablename UNION ALL SELECT * FROM tablename2) dummyname

and

SELECT * FROM tablename UNION SELECT * FROM tablename2

The thing I always find peculiar about UNION is that it removes all non-distinct rows even if those non-distinct rows appear within a single part of the UNION clause. For example, lets say tablename has one column: amt INT. Let's say amt INT is NOT a primary key and does not have any unique constraint on it. Then let's say I have two rows where amt = 20. Regardless of the UNION that I do, only one of those two rows will appear in the resulting set.

I was explaining this behavior to someone else, and then came up with above examples to demonstrate the behavior of UNION... I believe those two statements above are the same (always), but I just want to make sure that what I said is correct. Are there any edge cases where the two SQL statements above will not generate a totally identical result set?

Upvotes: 3

Views: 4700

Answers (2)

Fuzzy
Fuzzy

Reputation: 3810

In short yes you can find the explanation here:

/* Declare First Table */
DECLARE @Table1 TABLE (ColDetail VARCHAR(10))
INSERT INTO @Table1
SELECT 'First'
UNION ALL
SELECT 'Second'
UNION ALL
SELECT 'Third'
UNION ALL
SELECT 'Fourth'
UNION ALL
SELECT 'Fifth'
/* Declare Second Table */
DECLARE @Table2 TABLE (ColDetail VARCHAR(10))
INSERT INTO @Table2
SELECT 'First'
UNION ALL
SELECT 'Third'
UNION ALL
SELECT 'Fifth'

SELECT DISTINCT * FROM (SELECT * FROM @Table1 UNION ALL SELECT * FROM @Table2) dummyname

result 1:

enter image description here

SELECT * FROM @Table1 UNION SELECT * FROM @Table2

result 2:

enter image description here

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269973

The two versions are identical: UNION is functionally equivalent to SELECT DISTINCT on the UNION ALL.

UNION removes duplicate values within a table as well as values shared between the two tables.

I should note that from a performance point of view, the following is often faster than just a UNION:

select distinct t.col1, t.col2
from t
union
select distinct t1.col1, t.col2
from t1;

This may be counterintuitive. But, if there is an index on (col1, col2) in the tables, then it can be used for the select distinct, reducing the number of rows for the union. Some optimizers might miss this opportunity to simplify the inputs.

Upvotes: 8

Related Questions