Reputation: 912
Are these two SQL statements identical in all scenarios?
SELECT DISTINCT * FROM
(SELECT * FROM tablename UNION ALL SELECT * FROM tablename2) dummyname
and
SELECT * FROM tablename UNION SELECT * FROM tablename2
The thing I always find peculiar about UNION is that it removes all non-distinct rows even if those non-distinct rows appear within a single part of the UNION clause. For example, lets say tablename has one column: amt INT. Let's say amt INT is NOT a primary key and does not have any unique constraint on it. Then let's say I have two rows where amt = 20. Regardless of the UNION that I do, only one of those two rows will appear in the resulting set.
I was explaining this behavior to someone else, and then came up with above examples to demonstrate the behavior of UNION... I believe those two statements above are the same (always), but I just want to make sure that what I said is correct. Are there any edge cases where the two SQL statements above will not generate a totally identical result set?
Upvotes: 3
Views: 4700
Reputation: 3810
In short yes you can find the explanation here:
/* Declare First Table */
DECLARE @Table1 TABLE (ColDetail VARCHAR(10))
INSERT INTO @Table1
SELECT 'First'
UNION ALL
SELECT 'Second'
UNION ALL
SELECT 'Third'
UNION ALL
SELECT 'Fourth'
UNION ALL
SELECT 'Fifth'
/* Declare Second Table */
DECLARE @Table2 TABLE (ColDetail VARCHAR(10))
INSERT INTO @Table2
SELECT 'First'
UNION ALL
SELECT 'Third'
UNION ALL
SELECT 'Fifth'
SELECT DISTINCT * FROM (SELECT * FROM @Table1 UNION ALL SELECT * FROM @Table2) dummyname
result 1:
SELECT * FROM @Table1 UNION SELECT * FROM @Table2
result 2:
Upvotes: 2
Reputation: 1269973
The two versions are identical: UNION
is functionally equivalent to SELECT DISTINCT
on the UNION ALL
.
UNION
removes duplicate values within a table as well as values shared between the two tables.
I should note that from a performance point of view, the following is often faster than just a UNION
:
select distinct t.col1, t.col2
from t
union
select distinct t1.col1, t.col2
from t1;
This may be counterintuitive. But, if there is an index on (col1, col2)
in the tables, then it can be used for the select distinct
, reducing the number of rows for the union
. Some optimizers might miss this opportunity to simplify the inputs.
Upvotes: 8